Architecture

Cross-Entropy Loss

Quick Answer

A loss function measuring the difference between predicted and actual probability distributions.

Cross-entropy loss is the standard objective for training LLMs. It measures how well the model's predicted probability distribution matches the true next token. Lower loss means better predictions. For language modeling, cross-entropy loss is applied at each token position. The model learns to maximize probability of correct continuations. Cross-entropy is differentiable, enabling backpropagation. Token-level cross-entropy is summed or averaged to get the batch loss. Understanding cross-entropy is fundamental to understanding LLM training.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →