Matthew Hernandez 6f12edb0cc Fix issue: 731 by resolving semantic error (#738) 4 months ago
..
01_main-chapter-code 564e986496 fix issue #664 - inverted token and pos emb layers (#665) 5 months ago
02_bonus_bytepair-encoder f63f04d8d5 Fix BPE bonus materials (#561) 8 months ago
03_bonus_embedding-vs-matmul b94546aa14 minor spelling fix 1 year ago
04_bonus_dataloader-intuition bbb2a0c3d5 fixed num_workers (#229) 1 year ago
05_bpe-from-scratch 6f12edb0cc Fix issue: 731 by resolving semantic error (#738) 4 months ago
README.md baaa6c9283 fixed video link (#646) 5 months ago

README.md

Chapter 2: Working with Text Data

 

Main Chapter Code

 

Bonus Materials

  • 02_bonus_bytepair-encoder contains optional code to benchmark different byte pair encoder implementations

  • 03_bonus_embedding-vs-matmul contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.

  • 04_bonus_dataloader-intuition contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.

  • 05_bpe-from-scratch contains (bonus) code that implements and trains a GPT-2 BPE tokenizer from scratch.

In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.



Link to the video