- Introduction
- Setup
- Model configuration
- Causal masking
- Layer normalization
- Feed-forward network
- Token embeddings
- Position embeddings
- Multi-head attention
- Residual connections and layer normalization
- Transformer block
- Stacking transformer blocks
- Language model head
- Text generation