[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project
Hello :)
Today is Day 61!
A quick summary of today:- Covered Lecture 7: machine translation, seq2seq, attention from Stanford CS224N
I will first cover the GAN story
(then will share my notes from the lecture)
So... while watching and taking notes today, I started thinking, what if I can use my notes as data to a model and afterwards, when I want, I can give it raw string text and it will output text in the format of my notes (with my handwriting).
Well I started looking around and actually the first model architecture that came to my mind was the GAN (specifically conditional GAN) - I remembered there was a GAN architecture that alongside the pictures, we can give it the labels, and then on-demand generate. In retrospect, there are of course others, but I decided to go with GAN.
For maybe 2 hours I busted my head trying to make a simple model with the EMNIST dataset (english characters), and I kept getting weird input size issues. And after a bit I read online that the num of classes for the EMNIST from PyTorch is a bit weird (i.e.). What is more, I saw this:
This image with handwritten text is generated with a conditional deep conv GAN (repo link). And my mind was hooked on the idea -> I want to do it too. But firstly, I wanted to do it with just numbers, a bit more simple (and not having to struggle with loading the EMNIST dataset).Thanks to Aladdin Persson's youtube channel, I got reminded how GAN's architecture works and I had a simple model, and started training in my google colab. After a few hours of adjusting params to optimize training and lower loss, and also doing inference - producing new number images, I had a working model. I was so happy. This felt soo long, at first with the EMNIST problems, and then just getting a conditional DCGAN to work. Aaaand... my colab gpu free time finished in a middle of a run and all was lost because I had not saved any weights :/ I did not even take a screenshot of the generated output numbers I got - and they were amazing, human-like.