Which LLM Has the Largest Context Window?

Context window size determines how much text a model can process in a single request. Here are all models ranked by context window:

Llama 4 Scout (Meta): 10.48576M context window, 32,768 max output. $0.100/M input.

Gemini 2.5 Pro (Google): 1.048576M context window, 65,536 max output. $1.25/M input.

Llama 4 Maverick (Meta): 1.048576M context window, 32,768 max output. $0.200/M input.

Gemini 2.0 Flash (Google): 1.048576M context window, 8,192 max output. $0.100/M input.

Gemini 2.0 Flash Lite (Google): 1.048576M context window, 8,192 max output. $0.075/M input.

GPT-4.1 (OpenAI): 1.047576M context window, 32,768 max output. $2.00/M input.

GPT-4.1 Mini (OpenAI): 1.047576M context window, 32,768 max output. $0.400/M input.

GPT-4.1 Nano (OpenAI): 1.047576M context window, 32,768 max output. $0.100/M input.

o4-mini (OpenAI): 200K context window, 100,000 max output. $1.10/M input.

Claude Opus 4 (Anthropic): 200K context window, 32,000 max output. $15.00/M input.

o3-mini (OpenAI): 200K context window, 100,000 max output. $1.10/M input.

Claude Sonnet 4 (Anthropic): 200K context window, 64,000 max output. $3.00/M input.

Claude Haiku 4 (Anthropic): 200K context window, 8,192 max output. $0.800/M input.

DeepSeek R1 (DeepSeek): 128K context window, 8,192 max output. $0.550/M input.

Grok 3 (xAI): 128K context window, 16,384 max output. $3.00/M input.

DeepSeek V3 (DeepSeek): 128K context window, 8,192 max output. $0.270/M input.

GPT-4o (OpenAI): 128K context window, 16,384 max output. $2.50/M input.

Qwen 2.5 Max (Alibaba): 128K context window, 8,192 max output. $0.160/M input.

Mistral Large (Mistral): 128K context window, 8,192 max output. $2.00/M input.

GPT-4o Mini (OpenAI): 128K context window, 16,384 max output. $0.150/M input.

Grok 3 Mini (xAI): 128K context window, 16,384 max output. $0.300/M input.

Command R+ (Cohere): 128K context window, 4,096 max output. $2.50/M input.

Mistral Small (Mistral): 128K context window, 8,192 max output. $0.100/M input.

Command R (Cohere): 128K context window, 4,096 max output. $0.150/M input.

Phi-4 (Microsoft): 16.384K context window, 4,096 max output. $0.070/M input.

Why context window size matters:

Document analysis: Larger windows let you process entire documents, contracts, or codebases in a single request.

Conversation memory: Longer context means the model can remember more of the conversation history.

Few-shot examples: More context lets you include more examples for better in-context learning.

RAG applications: Larger context windows allow retrieving and injecting more relevant documents.

Note: Using the full context window increases latency and cost. Only include as much context as needed for your task.

Related Tools