Training

Reinforcement Learning from Human Feedback (RLHF)

Quick Answer

A training method using human preferences to fine-tune models beyond supervised learning.

RLHF trains models based on human preference feedback rather than supervised labels. The process: (1) collect human preferences between model outputs, (2) train a reward model to predict human preferences, (3) use RL to optimize the model to maximize predicted reward. RLHF aligns models with human values and improves quality beyond supervised learning. It's computationally expensive and data-intensive. Data quality is crucial—biased preference data results in biased models. RLHF is used in most state-of-the-art models but is not the only alignment approach. Recent alternatives include DPO.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms