Inference

Throughput

Quick Answer

The number of tokens generated per unit time, measuring inference speed at scale.

Throughput measures how many tokens a system produces per second. It's crucial for batch processing and high-volume systems. Higher throughput enables handling more concurrent requests. Throughput depends on hardware utilization and batch size. Techniques improving throughput: batching, quantization, efficient attention. Latency and throughput are often trade-offs—optimizing one may hurt the other. Different applications prioritize differently.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms