Which LLM Has the Largest Context Window?

Context window size determines how much text a model can process in a single request. Here are all models ranked by context window:


  • Llama 4 Scout (Meta): 10.48576M context window, 32,768 max output. $0.100/M input.
  • Gemini 2.5 Pro (Google): 1.048576M context window, 65,536 max output. $1.25/M input.
  • Llama 4 Maverick (Meta): 1.048576M context window, 32,768 max output. $0.200/M input.
  • Gemini 2.0 Flash (Google): 1.048576M context window, 8,192 max output. $0.100/M input.
  • Gemini 2.0 Flash Lite (Google): 1.048576M context window, 8,192 max output. $0.075/M input.
  • GPT-4.1 (OpenAI): 1.047576M context window, 32,768 max output. $2.00/M input.
  • GPT-4.1 Mini (OpenAI): 1.047576M context window, 32,768 max output. $0.400/M input.
  • GPT-4.1 Nano (OpenAI): 1.047576M context window, 32,768 max output. $0.100/M input.
  • o4-mini (OpenAI): 200K context window, 100,000 max output. $1.10/M input.
  • Claude Opus 4 (Anthropic): 200K context window, 32,000 max output. $15.00/M input.
  • o3-mini (OpenAI): 200K context window, 100,000 max output. $1.10/M input.
  • Claude Sonnet 4 (Anthropic): 200K context window, 64,000 max output. $3.00/M input.
  • Claude Haiku 4 (Anthropic): 200K context window, 8,192 max output. $0.800/M input.
  • DeepSeek R1 (DeepSeek): 128K context window, 8,192 max output. $0.550/M input.
  • Grok 3 (xAI): 128K context window, 16,384 max output. $3.00/M input.
  • DeepSeek V3 (DeepSeek): 128K context window, 8,192 max output. $0.270/M input.
  • GPT-4o (OpenAI): 128K context window, 16,384 max output. $2.50/M input.
  • Qwen 2.5 Max (Alibaba): 128K context window, 8,192 max output. $0.160/M input.
  • Mistral Large (Mistral): 128K context window, 8,192 max output. $2.00/M input.
  • GPT-4o Mini (OpenAI): 128K context window, 16,384 max output. $0.150/M input.
  • Grok 3 Mini (xAI): 128K context window, 16,384 max output. $0.300/M input.
  • Command R+ (Cohere): 128K context window, 4,096 max output. $2.50/M input.
  • Mistral Small (Mistral): 128K context window, 8,192 max output. $0.100/M input.
  • Command R (Cohere): 128K context window, 4,096 max output. $0.150/M input.
  • Phi-4 (Microsoft): 16.384K context window, 4,096 max output. $0.070/M input.

  • Why context window size matters:


  • Document analysis: Larger windows let you process entire documents, contracts, or codebases in a single request.
  • Conversation memory: Longer context means the model can remember more of the conversation history.
  • Few-shot examples: More context lets you include more examples for better in-context learning.
  • RAG applications: Larger context windows allow retrieving and injecting more relevant documents.

  • Note: Using the full context window increases latency and cost. Only include as much context as needed for your task.

    Related Questions