Без опису

Sebastian Raschka 4e61dc4224 Fix d_out code comment in bonus materials (#715)		4 місяців тому
.github	6522be94be Fix bug in masking when kv cache is used. (#697)	5 місяців тому
appendix-A	78bbcb3643 Remove redundant model = (#663)	5 місяців тому
appendix-D	661a6e84ee Fix: Typo in `appendix_d.py` comments. (#682)	5 місяців тому
appendix-E	c21bfe4a23 Add PyPI package (#576)	8 місяців тому
ch01	8939fdc846 Add setup video tutorial (#547)	8 місяців тому
ch02	564e986496 fix issue #664 - inverted token and pos emb layers (#665)	5 місяців тому
ch03	4e61dc4224 Fix d_out code comment in bonus materials (#715)	4 місяців тому
ch04	2f53bf5fe5 Link the other KV cache sections (#708)	5 місяців тому
ch05	c4ec55edac Support different Qwen3 sizes in pkg (#714)	4 місяців тому
ch06	4014bdd520 Ch06 classifier function asserts (#703)	5 місяців тому
ch07	ddbaf0d83e Use test mode arg in ch07 (#713)	4 місяців тому
pkg	c4ec55edac Support different Qwen3 sizes in pkg (#714)	4 місяців тому
setup	5c66f74fbe Fix issue #684: Minor docstring edit (#699)	5 місяців тому
.gitignore	190c66b3b0 Add Qwen3 1.7, 4B, 8B, and 32B support to from-scratch nb (#709)	4 місяців тому
CITATION.cff	ba3137fa2c Update CITATION.cff	1 рік тому
LICENSE.txt	1b635f760e fix misplaced parenthesis and update license (#466)	10 місяців тому
README.md	47a750014d Add link to free exercise PDF (#706)	5 місяців тому
pixi.toml	3dfd7e5f06 Update pixi (#661)	5 місяців тому
pyproject.toml	c4ec55edac Support different Qwen3 sizes in pkg (#714)	4 місяців тому
requirements.txt	cd5cf8112b Consitent spacing (#546)	9 місяців тому

Build a Large Language Model (From Scratch)

This repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch).

In Build a Large Language Model (From Scratch), you'll learn and understand how large language models (LLMs) work from the inside out by coding them from the ground up, step by step. In this book, I'll guide you through creating your own LLM, explaining each stage with clear text, diagrams, and examples.

The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT. In addition, this book includes code for loading the weights of larger pretrained models for finetuning.

Link to the official source code repository
Link to the book at Manning (the publisher's website)
Link to the book page on Amazon.com
ISBN 9781633437166

To download a copy of this repository, click on the Download ZIP button or execute the following command in your terminal:

git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git

(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at https://github.com/rasbt/LLMs-from-scratch for the latest updates.)

Please note that this README.md file is a Markdown (.md) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, MarkText is a good free option.

You can alternatively view this and other files on GitHub at https://github.com/rasbt/LLMs-from-scratch in your browser, which renders Markdown automatically.

Tip: If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the README.md file located in the setup directory.

Chapter Title	Main Code (for Quick Access)	All Code + Supplementary
Setup recommendations	-	-
Ch 1: Understanding Large Language Models	No code	-
Ch 2: Working with Text Data	- ch02.ipynb - dataloader.ipynb (summary) - exercise-solutions.ipynb	./ch02
Ch 3: Coding Attention Mechanisms	- ch03.ipynb - multihead-attention.ipynb (summary) - exercise-solutions.ipynb	./ch03
Ch 4: Implementing a GPT Model from Scratch	- ch04.ipynb - gpt.py (summary) - exercise-solutions.ipynb	./ch04
Ch 5: Pretraining on Unlabeled Data	- ch05.ipynb - gpt_train.py (summary) - gpt_generate.py (summary) - exercise-solutions.ipynb	./ch05
Ch 6: Finetuning for Text Classification	- ch06.ipynb - gpt_class_finetune.py - exercise-solutions.ipynb	./ch06
Ch 7: Finetuning to Follow Instructions	- ch07.ipynb - gpt_instruction_finetuning.py (summary) - ollama_evaluate.py (summary) - exercise-solutions.ipynb	./ch07
Appendix A: Introduction to PyTorch	- code-part1.ipynb - code-part2.ipynb - DDP-script.py - exercise-solutions.ipynb	./appendix-A
Appendix B: References and Further Reading	No code	-
Appendix C: Exercise Solutions	No code	-
Appendix D: Adding Bells and Whistles to the Training Loop	- appendix-D.ipynb	./appendix-D
Appendix E: Parameter-efficient Finetuning with LoRA	- appendix-E.ipynb	./appendix-E

The mental model below summarizes the contents covered in this book.

Hardware Requirements

The code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. (Please see the setup doc for additional recommendations.)

Exercises

Each chapter of the book includes several exercises. The solutions are summarized in Appendix C, and the corresponding code notebooks are available in the main chapter folders of this repository (for example, ./ch02/01_main-chapter-code/exercise-solutions.ipynb.

In addition to the code exercises, you can download a free 170-page PDF titled Test Yourself On Build a Large Language Model (From Scratch) from the Manning website. It contains approximately 30 quiz questions and solutions per chapter to help you test your understanding.

Bonus Material

Several folders contain optional materials as a bonus for interested readers:

Setup
Chapter 2: Working with text data
Chapter 3: Coding attention mechanisms
- Comparing Efficient Multi-Head Attention Implementations
- Understanding PyTorch Buffers
Chapter 4: Implementing a GPT model from scratch
- FLOPS Analysis
- KV Cache
Chapter 5: Pretraining on unlabeled data:
Chapter 6: Finetuning for classification
Chapter 7: Finetuning to follow instructions

Questions, Feedback, and Contributing to This Repository

I welcome all sorts of feedback, best shared via the Manning Forum or GitHub Discussions. Likewise, if you have any questions or just want to bounce ideas off others, please don't hesitate to post these in the forum as well.

Please note that since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.

Citation

If you find this book or code useful for your research, please consider citing it.

Chicago-style citation:

Raschka, Sebastian. Build A Large Language Model (From Scratch). Manning, 2024. ISBN: 978-1633437166.

BibTeX entry:

@book{build-llms-from-scratch-book,
  author       = {Sebastian Raschka},
  title        = {Build A Large Language Model (From Scratch)},
  publisher    = {Manning},
  year         = {2024},
  isbn         = {978-1633437166},
  url          = {https://www.manning.com/books/build-a-large-language-model-from-scratch},
  github       = {https://github.com/rasbt/LLMs-from-scratch}
}

README.md