Posts

Showing posts from February 16, 2024

[Day 46] Meeting Transformers again and their implementation

Image
 Hello :) Today is Day 46! Understanding Transformers with Professor Choi from KAIST The first time I learned about transformer was Day 32, it was a simple intro, but I did not understand exactly what is happening. I felt like, I was just made aware of their existance in the NLP world. This img is from Andrew Ng's Deep learning course.  In a transformer, the data goes through encoder-decoder network. In the encoder: for each token its attention is calculated according to the other tokens. And This attention mechanism allows the model to weigh the importance of each token in the context of the entire sequence. This information is put through a feed forward network that extracts deeper features.  In the decoder, we start to predict words. For example we start with an <start of sentence> token, then we pass that at the bottom, then from the encoder we take the K(key) and V(value) and with the Q(query) from the decoder input, we try to predict the next item in the sentence (what

[Day 45] Trying to understand VAEs with Professor Choi from KAIST

Image
 Hello :) Today is Day 45! A quick summary of today: Learning the theory behind VAEs with Professor Choi from KAIST Implementing a VAE from scratch with  Aladdin Persson  (now probably my 2nd favourite DL youtuber after Andrej Karpathy) 1) Theory behind Variational Auto Encoders (VAEs) we begin with autoencoders given an input, they compress it into a lower-dimensional representation, and there is a decoder, that reconstructs the original input from this compressed representation. What VAEs want to do is with that space in the middle, sample from it and generate new samples.  But that comes with its challenges.  The idea behind VAEs is that we want to estimate the posterior distribution. We have X, and we want to know the distribution of Z in reality, estimating the posterio dist is extremely hard, so what we do instead is we approximate P(Z|X) to be pretty similar to a Q(Z) which follows a gaus distribution. But P(Z|X) doesnt always follow a gaus dist, so in those cases doing a Q(Z) l