Architecture

Feed-Forward Network

Quick Answer

A layer of dense transformations between attention layers in transformers.

Feed-forward networks (FFN) in transformers are applied independently to each token position. They typically consist of two linear layers with a nonlinear activation (GELU or ReLU) between them. The FFN expands to a higher dimension (4x the model dimension) then projects back down. FFNs are where most model parameters live and are crucial for model capacity. They work independently per token, unlike attention which compares tokens. FFN design choices (expansion factor, activation function) impact model quality and efficiency.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →