AI Token Counter — Free Online Tool
Estimate token counts for GPT-4o, Claude, Gemini, and Llama. See the cost for this text across the top 5 LLM APIs — per request and per month at your scale.
Token count by model
What is a token?
Tokens are the units LLMs process text in. Roughly 1 token ≈ 4 characters or 0.75 words in English. Non-English languages, code, and symbols use more tokens per character. All LLM APIs charge per token.
Input vs output tokens
Input tokens (your prompt) and output tokens (the model's response) are priced separately. Output tokens typically cost 3–5× more than input tokens. This tool shows input cost only — add your expected output token count for full cost.
Reduce token costs
Use prompt caching to cut repeated context costs by up to 90%. Switch to a cheaper model for high-volume tasks. See the full cost optimization guide.
Frequently Asked Questions
How accurate are these token estimates?
These are approximations based on the standard ratio of ~1.3 tokens per word for English text. For exact counts, use the official tokenizer for each model (tiktoken for OpenAI, Anthropic's token counting API for Claude). Estimates are typically within 5–15% of actual for normal prose.
Why do different models have different token counts?
Each model uses a slightly different tokenizer vocabulary. GPT models use cl100k_base; Claude uses a distinct but similar tokenizer. For most English text the difference is under 5%. For code, symbols, and non-English text, differences can be more pronounced.
Does this include output tokens in the cost?
No — this tool calculates input token cost only (the cost of sending your text to the model). Output tokens are typically 3–5× more expensive per token and depend on response length. Use the monthly request volume slider to estimate input costs at scale.
What is prompt caching and how does it reduce costs?
Prompt caching reuses the computed representation of a long static prompt prefix across many requests. With Anthropic, cached tokens cost 90% less. With OpenAI, 50% less. If your prompt has a large static section that repeats, caching can be your single largest cost reduction.
Which LLM API is cheapest for high-volume workloads?
For the absolute lowest input cost, Gemini 2.0 Flash Lite and GPT-4.1 Nano are the cheapest managed options. For open-source models via API, Llama 4 Scout on Together AI or Fireworks is competitive. See the full cheapest LLM API comparison for live pricing.