Posts

Showing posts from March 9, 2024

[Day 68] Build a LLM from scratch chapter 4 - making the GPT-2 architecture

Image
Hello :) Today is Day 68! A quick summary of today:  Covered chapter 4 of Build a LLM from scratch by Sebastian Raschka Below is an overview of the content with not much code. For the full code version of every step - it is on this github repo . This chapter is the 3rd and final step from the 1st state towards a LLM 4.1 Coding a LLM architecture The book will build the smallest version of GPT-2 that has 124m parameters, with the below config. The final architecture is a combination of a few steps, presented below After creating and initializing the model, and the gpt-2 tokenizer, on a batch of 2 sentences: 'Every effort moves you' and 'Every day holds a', the output is: 4.2 Normalizing activations with layer normalization Taking an example without layer norm The mean and var are Apply layer norm, and the data has 0 mean and unit variance. We can put the code in a proper class to be used later for the GPT model 4.3 Implementing a feed forward network with GELU activation