Fastest LLM API — Speed Comparison

LLM API speed is measured by two key metrics: Time to First Token (TTFT) and throughput (tokens per second). Here's how all models compare:

Fastest by TTFT (time to first token):

Gemini 2.0 Flash Lite (Google): 100ms TTFT. $0.075/M input.

Phi-4 (Microsoft): 100ms TTFT. $0.070/M input.

Gemini 2.0 Flash (Google): 120ms TTFT. $0.100/M input.

GPT-4.1 Nano (OpenAI): 130ms TTFT. $0.100/M input.

Claude Haiku 4 (Anthropic): 150ms TTFT. $0.800/M input.

Fastest by throughput (tokens/second):

Gemini 2.0 Flash Lite (Google): 180 tok/s. $0.075/M input.

Gemini 2.0 Flash (Google): 160 tok/s. $0.100/M input.

Phi-4 (Microsoft): 160 tok/s. $0.070/M input.

GPT-4.1 Nano (OpenAI): 150 tok/s. $0.100/M input.

Claude Haiku 4 (Anthropic): 130 tok/s. $0.800/M input.

When speed matters most: Real-time chat interfaces, autocomplete, streaming code generation, and any application where users are waiting for a response. For background processing and batch jobs, throughput matters more than TTFT.

Tip: Smaller models are generally faster. If your task doesn't require top-tier reasoning, a model like GPT-4.1 Mini or Gemini Flash will give you much better latency.

Related Tools

Speed Comparison Full Pricing Table

Fastest LLM API — Speed Comparison

Related Tools

Related Questions

How Much Does Claude API Cost?

What's the Cheapest LLM for Coding?

ChatGPT vs Claude: Which Is Better?

Best LLM API for Production Use

LLM API Pricing Comparison — Complete Guide

How to Reduce LLM API Costs

Which LLM Has the Largest Context Window?