Evaluation
ARC
Quick Answer
AI2 Reasoning Challenge: multiple-choice science questions requiring knowledge and reasoning.
ARC consists of 7,787 multiple-choice science questions (grades 3-9). ARC is challenging—it requires both knowledge and reasoning. Models struggle with ARC compared to MMLU despite lower complexity. ARC tests understanding over memorization. Both easy and hard subsets exist. Modern models achieve ~70%+ on the full set. ARC is useful for evaluating reasoning ability.
Last verified: 2026-04-08