LLM Speed Comparison 2026
Compare time to first token (TTFT) and throughput across 25 large language models. Filter by provider to find the fastest API for your use case.
Data verified Apr 3, 2026
OpenAI
Anthropic
Google
Meta
Mistral
DeepSeek
xAI
Cohere
Microsoft
Alibaba
Time to First Token (lower is better)
Throughput (higher is better)
All Models — Ranked by Speed
| Model | Provider | TTFT (ms) | Tokens/sec | Input $/M | Arena ELO |
|---|---|---|---|---|---|
| Gemini 2.0 Flash Lite | 100 | 180 | $0.075 | 1200 | |
| Phi-4 | Microsoft | 100 | 160 | $0.070 | 1150 |
| Gemini 2.0 Flash | 120 | 160 | $0.100 | 1260 | |
| GPT-4.1 Nano | OpenAI | 130 | 150 | $0.100 | 1180 |
| Claude Haiku 4 | Anthropic | 150 | 130 | $0.800 | 1220 |
| Mistral Small | Mistral | 160 | 120 | $0.100 | 1185 |
| GPT-4o Mini | OpenAI | 180 | 120 | $0.150 | 1220 |
| Grok 3 Mini | xAI | 180 | 110 | $0.300 | 1220 |
| GPT-4.1 Mini | OpenAI | 190 | 115 | $0.400 | 1240 |
| Llama 4 Scout | Meta | 200 | 110 | $0.100 | 1250 |
| DeepSeek V3 | DeepSeek | 220 | 85 | $0.270 | 1280 |
| GPT-4o | OpenAI | 230 | 95 | $2.50 | 1260 |
| Qwen 2.5 Max | Alibaba | 240 | 80 | $0.160 | 1260 |
| Llama 4 Maverick | Meta | 250 | 90 | $0.200 | 1290 |
| Command R | Cohere | 250 | 85 | $0.150 | 1140 |
| GPT-4.1 | OpenAI | 260 | 88 | $2.00 | 1290 |
| Mistral Large | Mistral | 280 | 75 | $2.00 | 1245 |
| Grok 3 | xAI | 300 | 80 | $3.00 | 1300 |
| Claude Sonnet 4 | Anthropic | 320 | 78 | $3.00 | 1280 |
| Command R+ | Cohere | 350 | 65 | $2.50 | 1200 |
| Gemini 2.5 Pro | 400 | 70 | $1.25 | 1340 | |
| Claude Opus 4 | Anthropic | 500 | 50 | $15.00 | 1330 |
| o4-mini | OpenAI | 1,200 | 60 | $1.10 | 1350 |
| o3-mini | OpenAI | 1,500 | 55 | $1.10 | 1310 |
| DeepSeek R1 | DeepSeek | 1,800 | 45 | $0.550 | 1310 |
Frequently Asked Questions
- Which LLM is fastest?
- Speed depends on which metric matters to you. For time to first token (TTFT), smaller models like GPT-4.1 Nano and Gemini 2.0 Flash Lite are fastest at under 100ms. For throughput (tokens per second), Gemini 2.0 Flash and GPT-4o-mini lead with 200+ tokens/sec. Larger reasoning models like o3-mini and DeepSeek R1 trade speed for quality.
- What is TTFT?
- TTFT stands for Time to First Token. It measures how many milliseconds pass between sending your API request and receiving the first token of the response. Lower TTFT means the user sees output faster, which is critical for real-time chat applications, autocomplete, and streaming UIs. TTFT is separate from throughput, which measures how fast tokens are generated after the first one.
- Does speed affect quality?
- Generally, faster models sacrifice some benchmark performance. Smaller, distilled models (like GPT-4.1 Nano or Gemini Flash Lite) are much faster but score lower on complex reasoning tasks. However, this isn't always the case — GPT-4o offers competitive quality with good speed. Choose based on your task: simple classification and extraction can use fast models, while complex analysis benefits from slower, more capable ones.