Evaluation

MMLU

Quick Answer

Massive Multitask Language Understanding: a broad benchmark covering 57 academic subjects.

MMLU is a comprehensive knowledge benchmark covering 57 subjects (science, history, law, medicine, etc.). It includes multiple-choice questions from high school through professional exams. MMLU requires broad knowledge and reasoning. Performance on MMLU correlates with general capability. MMLU is widely used for comparing models. State-of-the-art models achieve ~90% accuracy. MMLU has limitations—it tests knowledge recall more than reasoning. MMLU remains a standard evaluation metric.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →