Context window size determines how much text a model can process in a single request. Here are all models ranked by context window:
Llama 4 Scout (Meta): 10.48576M context window, 32,768 max output. $0.100/M input.Gemini 2.5 Pro (Google): 1.048576M context window, 65,536 max output. $1.25/M input.Llama 4 Maverick (Meta): 1.048576M context window, 32,768 max output. $0.200/M input.Gemini 2.0 Flash (Google): 1.048576M context window, 8,192 max output. $0.100/M input.Gemini 2.0 Flash Lite (Google): 1.048576M context window, 8,192 max output. $0.075/M input.GPT-4.1 (OpenAI): 1.047576M context window, 32,768 max output. $2.00/M input.GPT-4.1 Mini (OpenAI): 1.047576M context window, 32,768 max output. $0.400/M input.GPT-4.1 Nano (OpenAI): 1.047576M context window, 32,768 max output. $0.100/M input.o4-mini (OpenAI): 200K context window, 100,000 max output. $1.10/M input.Claude Opus 4 (Anthropic): 200K context window, 32,000 max output. $15.00/M input.o3-mini (OpenAI): 200K context window, 100,000 max output. $1.10/M input.Claude Sonnet 4 (Anthropic): 200K context window, 64,000 max output. $3.00/M input.Claude Haiku 4 (Anthropic): 200K context window, 8,192 max output. $0.800/M input.DeepSeek R1 (DeepSeek): 128K context window, 8,192 max output. $0.550/M input.Grok 3 (xAI): 128K context window, 16,384 max output. $3.00/M input.DeepSeek V3 (DeepSeek): 128K context window, 8,192 max output. $0.270/M input.GPT-4o (OpenAI): 128K context window, 16,384 max output. $2.50/M input.Qwen 2.5 Max (Alibaba): 128K context window, 8,192 max output. $0.160/M input.Mistral Large (Mistral): 128K context window, 8,192 max output. $2.00/M input.GPT-4o Mini (OpenAI): 128K context window, 16,384 max output. $0.150/M input.Grok 3 Mini (xAI): 128K context window, 16,384 max output. $0.300/M input.Command R+ (Cohere): 128K context window, 4,096 max output. $2.50/M input.Mistral Small (Mistral): 128K context window, 8,192 max output. $0.100/M input.Command R (Cohere): 128K context window, 4,096 max output. $0.150/M input.Phi-4 (Microsoft): 16.384K context window, 4,096 max output. $0.070/M input.Why context window size matters:
Document analysis: Larger windows let you process entire documents, contracts, or codebases in a single request.Conversation memory: Longer context means the model can remember more of the conversation history.Few-shot examples: More context lets you include more examples for better in-context learning.RAG applications: Larger context windows allow retrieving and injecting more relevant documents.Note: Using the full context window increases latency and cost. Only include as much context as needed for your task.