Architecture

Layer Normalization

Quick Answer

A normalization technique that stabilizes training by normalizing activations across features.

Layer normalization standardizes activations within each example across the feature dimension. Unlike batch normalization (which normalizes across examples), layer norm works independently for each sample. Layer norm is standard in transformers for stable training and better generalization. It's applied before (pre-norm) or after (post-norm) attention and FFN layers. Pre-norm architectures are often more stable. Layer norm is essential for deep models—without it, training becomes unstable.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →