1. Introduction
  2. Project setup
  3. Model configuration
  4. Feed-forward network
  5. Causal masking
  6. Multi-head attention
  7. Layer normalization
  8. Transformer block
  9. Stacking transformer blocks
  10. Language model head
  11. Encode and decode tokens
  12. Text generation
  13. Load weights and run model