- Introduction
- Project setup
- Run the model
- Model configuration
- Feed-forward network
- Causal masking
- Multi-head attention
- Layer normalization
- Transformer block
- Stack transformer blocks
- Language model head
- Weight adaptation
- KV cache configuration
- Pipeline model
- Architecture registration