Posts

Showing posts from March 8, 2024

[Day 67] Build a LLM from scratch chapter 3 - self-attention from scratch

Image
 Hello :) Today is Day 67! A quick summary of today Covered chapter 3 of Build a LLM from scratch by Sebastian Raschka Below are just notes with general overview, but the bigger part - with all the code implementations, developing each self-attention part from scratch is on my github repo . I uploaded yesterday's chapter's content too as well. In this chapter, the attention mechanism is explored in depth. Attention can be split into 4 parts that build on each other. 3.1 The problem with modeling long sequences Imagine we want to develop a machine translation model - in different languages word order might be different (even if it follows the same verb object subject format). When translating we would want our model to see words that appear earlier/later, and not translate word by word because that might result in something incorrect. Before Transformers, there were RNNs but the problem with them was that before the info is passed to the decoder, all that information is stored

[Day 66] Starting Build a LLM from scratch by Sebastian Raschka

Image
 Hello :) Today is Day 66! A quick summary of today, covered the below chapters from the title book: Chapter 1: Understanding LLMs Chapter 2: Working with text data I am in a discord server called DS/ML book club, and one of the organisers is Sophia Yang from MistralAI and I saw she posted about the server reading this book called Build a Large Language Model (From Scratch) by Sebastian Raschka (it is still being written but half of the chapters are available for purchase). Given I just completed a very comprehensive NLP course by Stanford Uni, this felt a good next stop - to test my knowledge and understanding of language models and how they work.  So, below is an intro to the book, quick summary of chapter 1, and then a bit more in depth for chapter 2. The perhaps a bit more interesting part is Chapter 2 (available upon purchase) The book goes over the life of a LLM The content follows the below diagram but not all content is written.  Understanding LLMs Working with text data Coding