LLM Speed Comparison 2026

Compare time to first token (TTFT) and throughput across 25 large language models. Filter by provider to find the fastest API for your use case.

Data verified Apr 3, 2026
OpenAI
Anthropic
Google
Meta
Mistral
DeepSeek
xAI
Cohere
Microsoft
Alibaba

Time to First Token (lower is better)

Throughput (higher is better)

All Models — Ranked by Speed

ModelProviderTTFT (ms)Tokens/secInput $/MArena ELO
Gemini 2.0 Flash LiteGoogle100180$0.0751200
Phi-4Microsoft100160$0.0701150
Gemini 2.0 FlashGoogle120160$0.1001260
GPT-4.1 NanoOpenAI130150$0.1001180
Claude Haiku 4Anthropic150130$0.8001220
Mistral SmallMistral160120$0.1001185
GPT-4o MiniOpenAI180120$0.1501220
Grok 3 MinixAI180110$0.3001220
GPT-4.1 MiniOpenAI190115$0.4001240
Llama 4 ScoutMeta200110$0.1001250
DeepSeek V3DeepSeek22085$0.2701280
GPT-4oOpenAI23095$2.501260
Qwen 2.5 MaxAlibaba24080$0.1601260
Llama 4 MaverickMeta25090$0.2001290
Command RCohere25085$0.1501140
GPT-4.1OpenAI26088$2.001290
Mistral LargeMistral28075$2.001245
Grok 3xAI30080$3.001300
Claude Sonnet 4Anthropic32078$3.001280
Command R+Cohere35065$2.501200
Gemini 2.5 ProGoogle40070$1.251340
Claude Opus 4Anthropic50050$15.001330
o4-miniOpenAI1,20060$1.101350
o3-miniOpenAI1,50055$1.101310
DeepSeek R1DeepSeek1,80045$0.5501310

Frequently Asked Questions

Which LLM is fastest?
Speed depends on which metric matters to you. For time to first token (TTFT), smaller models like GPT-4.1 Nano and Gemini 2.0 Flash Lite are fastest at under 100ms. For throughput (tokens per second), Gemini 2.0 Flash and GPT-4o-mini lead with 200+ tokens/sec. Larger reasoning models like o3-mini and DeepSeek R1 trade speed for quality.
What is TTFT?
TTFT stands for Time to First Token. It measures how many milliseconds pass between sending your API request and receiving the first token of the response. Lower TTFT means the user sees output faster, which is critical for real-time chat applications, autocomplete, and streaming UIs. TTFT is separate from throughput, which measures how fast tokens are generated after the first one.
Does speed affect quality?
Generally, faster models sacrifice some benchmark performance. Smaller, distilled models (like GPT-4.1 Nano or Gemini Flash Lite) are much faster but score lower on complex reasoning tasks. However, this isn't always the case — GPT-4o offers competitive quality with good speed. Choose based on your task: simple classification and extraction can use fast models, while complex analysis benefits from slower, more capable ones.