Fastest LLM API — Speed Comparison

LLM API speed is measured by two key metrics: Time to First Token (TTFT) and throughput (tokens per second). Here's how all models compare:


Fastest by TTFT (time to first token):


  • Gemini 2.0 Flash Lite (Google): 100ms TTFT. $0.075/M input.
  • Phi-4 (Microsoft): 100ms TTFT. $0.070/M input.
  • Gemini 2.0 Flash (Google): 120ms TTFT. $0.100/M input.
  • GPT-4.1 Nano (OpenAI): 130ms TTFT. $0.100/M input.
  • Claude Haiku 4 (Anthropic): 150ms TTFT. $0.800/M input.

  • Fastest by throughput (tokens/second):


  • Gemini 2.0 Flash Lite (Google): 180 tok/s. $0.075/M input.
  • Gemini 2.0 Flash (Google): 160 tok/s. $0.100/M input.
  • Phi-4 (Microsoft): 160 tok/s. $0.070/M input.
  • GPT-4.1 Nano (OpenAI): 150 tok/s. $0.100/M input.
  • Claude Haiku 4 (Anthropic): 130 tok/s. $0.800/M input.

  • When speed matters most: Real-time chat interfaces, autocomplete, streaming code generation, and any application where users are waiting for a response. For background processing and batch jobs, throughput matters more than TTFT.


    Tip: Smaller models are generally faster. If your task doesn't require top-tier reasoning, a model like GPT-4.1 Mini or Gemini Flash will give you much better latency.

    Related Questions