llmtokenspythontutorial

How to Count Tokens for GPT-4o, Claude, and Gemini

The most confusing part of working with LLM APIs isn't the API itself -- it's figuring out how many tokens your prompt uses and what it'll cost. Every model tokenizes text differently, and pricing is per-token. Let's demystify this.

What is a token?

A token is a chunk of text that the model processes as a single unit. It's not a word, not a character -- it's somewhere in between.

Rules of thumb:

  • 1 token is roughly 4 characters in English
  • 1 token is roughly 0.75 words
  • 100 tokens is about 75 words

But these are averages. Some examples:

TextApproximate tokens
"Hello"1
"Hello, world!"4
"Supercalifragilisticexpialidocious"9
"こんにちは" (Japanese)3-5
{"key": "value"}7

Code, non-English text, and special characters tend to use more tokens per "word" than plain English prose.

Why token counts matter

You pay for tokens on both sides: input tokens (your prompt) and output tokens (the model's response). Output tokens are typically 3-5x more expensive than input tokens. So a chatbot that generates long responses will cost much more than one that generates short answers, even with identical prompts.

Counting tokens for OpenAI models

OpenAI uses the tiktoken library. It's fast, accurate, and works offline.

pip install tiktoken

import tiktoken

# GPT-4o uses the o200k_base encoding
enc = tiktoken.encoding_for_model("gpt-4o")

text = "How many tokens is this sentence?"
tokens = enc.encode(text)

print(f"Token count: {len(tokens)}")  # 7
print(f"Tokens: {tokens}")           # [2347, 1784, 5765, ...]

Different models use different encodings:

ModelEncoding
GPT-4o, GPT-4o-minio200k_base
GPT-4, GPT-3.5 Turbocl100k_base

For chat messages, remember that the message format adds overhead. Each message has role tokens, formatting tokens, etc. The OpenAI cookbook has the exact formula, but expect roughly 4 extra tokens per message.

Counting tokens for Anthropic (Claude)

Anthropic provides a token counting API endpoint:

import anthropic

client = anthropic.Anthropic()

response = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user", "content": "How many tokens is this?"}
    ]
)

print(f"Input tokens: {response.input_tokens}")

For offline estimation, Claude uses a similar byte-pair encoding to OpenAI. The tiktoken cl100k_base encoding gives a reasonable approximation (within about 10%), though it won't be exact.

Counting tokens for Google Gemini

The Gemini SDK includes a built-in token counter:

import google.generativeai as genai

model = genai.GenerativeModel("gemini-2.0-flash")
response = model.count_tokens("How many tokens is this?")
print(f"Total tokens: {response.total_tokens}")

Pricing table (as of early 2026)

Here are the per-token costs for the most popular models. Prices change, so check the provider's pricing page for current rates.

ModelInput (per 1M tokens)Output (per 1M tokens)
GPT-4o$2.50$10.00
GPT-4o-mini$0.15$0.60
Claude Sonnet 4$3.00$15.00
Claude Haiku 3.5$0.80$4.00
Gemini 2.0 Flash$0.10$0.40

Example cost calculation:

You send a 2,000-token prompt to GPT-4o and get a 500-token response:

  • Input cost: 2,000 / 1,000,000 * $2.50 = $0.005
  • Output cost: 500 / 1,000,000 * $10.00 = $0.005
  • Total: $0.01 per request

At 10,000 requests/day, that's $100/day or $3,000/month. This is why token counting matters.

Reducing token usage

A few practical strategies:

1. Trim your system prompt. That system prompt gets sent with every request. Shaving 500 tokens off it saves money on every single call.

2. Use shorter model outputs. Add "Be concise" or set max_tokens to limit response length.

3. Pick the right model. GPT-4o-mini and Gemini Flash are 10-25x cheaper than their full-size counterparts. For many tasks they perform just as well.

4. Cache when possible. If multiple users ask similar questions, cache the responses. Anthropic offers prompt caching that reduces costs for repeated prefixes.

5. Estimate before you commit. Before building a pipeline that makes thousands of API calls, estimate the cost on a sample. Count tokens for 10 representative inputs, multiply by your expected volume.

Quick estimates without code

If you just need a rough count without setting up a Python environment, textshifter.com/tools/token-counter lets you paste text and see token counts for multiple models side by side, along with cost estimates. It runs in the browser so your text stays local.

The JavaScript approach

For Node.js applications, you can use js-tiktoken or gpt-tokenizer:

import { encoding_for_model } from '@dqbd/tiktoken';

const enc = encoding_for_model('gpt-4o');
const tokens = enc.encode('How many tokens is this?');
console.log(`Token count: ${tokens.length}`);
enc.free(); // important: free the WASM memory

Key takeaways

  • Tokens are not words. Always count, don't guess.
  • Output tokens cost 3-5x more than input tokens.
  • Use the official libraries (tiktoken, Anthropic's API, Gemini SDK) for accurate counts.
  • Estimate costs before scaling up. A 10x increase in prompt size means a 10x increase in cost.
  • For most applications, smaller models (GPT-4o-mini, Gemini Flash, Claude Haiku) offer the best cost-to-quality ratio.

Your ad here

Related Tools