1. Introduction
  2. Setup
  3. Model configuration
  4. Causal masking
  5. Layer normalization
  6. Feed-forward network
  7. Token embeddings
  8. Position embeddings
  9. Multi-head attention
  10. Residual connections and layer normalization
  11. Transformer block
  12. Stacking transformer blocks
  13. Language model head
  14. Text generation