Evaluation

Out-of-Distribution

Quick Answer

Data or scenarios different from the training distribution, testing model generalization.

Out-of-distribution (OOD) data differs from training distribution. Models often degrade on OOD data (overfitting to training distribution). OOD evaluation tests robustness. OOD can be adversarial or natural distribution shift. Evaluating OOD performance reveals capability limits. Building robust models requires OOD consideration. OOD robustness is increasingly important.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →