- Introduction
- Project setup
- Model configuration
- Feed-forward network
- Causal masking
- Multi-head attention
- Layer normalization
- Transformer block
- Stacking transformer blocks
- Language model head
- Encode and decode tokens
- Text generation
- Load weights and run model