- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.
+
## Bonus Materials
- [02_bonus_efficient-multihead-attention](02_bonus_efficient-multihead-attention) implements and compares different implementation variants of multihead-attention
# Chapter 4: Implementing a GPT Model from Scratch to Generate Text
+
## Main Chapter Code
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.
-## Optional Code
+
+## Bonus Materials
-- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter.
+- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter
+- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code
+
## Bonus Materials
- [02_alternative_weight_loading](02_alternative_weight_loading) contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code
+
## Bonus Materials
- [02_bonus_additional-experiments](02_bonus_additional-experiments) includes additional experiments (e.g., training the last vs first token, extending the input length, etc.)