Here you go —
1. Transformers
- Visual intro to Transformers (3b1b) [YouTube]
-nanoGPT & tokenization (Karpathy) [YouTube]
-Decoding strategies in LLMs (Maxime Labonne) [GitHub]
2. Pre training
-Distributed training techniques (Duan et al.) [Paper]
-nanotron: Minimal training framework (Hugging Face) [GitHub]
-Parallel training overview (Cheny... See more