Architecture
Layer Normalization
Quick Answer
A normalization technique that stabilizes training by normalizing activations across features.
Layer normalization standardizes activations within each example across the feature dimension. Unlike batch normalization (which normalizes across examples), layer norm works independently for each sample. Layer norm is standard in transformers for stable training and better generalization. It's applied before (pre-norm) or after (post-norm) attention and FFN layers. Pre-norm architectures are often more stable. Layer norm is essential for deep models—without it, training becomes unstable.
Last verified: 2026-04-08