[Day 67] Build a LLM from scratch chapter 3 - self-attention from scratch
Hello :) Today is Day 67! A quick summary of today Covered chapter 3 of Build a LLM from scratch by Sebastian Raschka Below are just notes with general overview, but the bigger part - with all the code implementations, developing each self-attention part from scratch is on my github repo . I uploaded yesterday's chapter's content too as well. In this chapter, the attention mechanism is explored in depth. Attention can be split into 4 parts that build on each other. 3.1 The problem with modeling long sequences Imagine we want to develop a machine translation model - in different languages word order might be different (even if it follows the same verb object subject format). When translating we would want our model to see words that appear earlier/later, and not translate word by word because that might result in something incorrect. Before Transformers, there were RNNs but the problem with them was that before the info is passed to the decoder, all that information is s...