1. Introduction
  2. Project setup
  3. Run the model
  4. Model configuration
  5. Feed-forward network
  6. Causal masking
  7. Multi-head attention
  8. Layer normalization
  9. Transformer block
  10. Stack transformer blocks
  11. Language model head
  12. Weight adaptation
  13. KV cache configuration
  14. Pipeline model
  15. Architecture registration