batch-apillm-costasyncopenaianthropiccost-optimization

Batch API vs Realtime LLM Calls: Cost Comparison and When to Switch

Batch API vs Realtime LLM Calls: Cost Comparison and When to Switch

Quick answer: Both OpenAI and Anthropic offer 50% discounts on batch API processing with a 24-hour turnaround SLA. If any portion of your workload can tolerate async processing, batch API is an immediate 50% cost reduction on that portion. Most production systems have at least 40-60% batch-eligible workloads that teams are running synchronously out of habit.


What the batch API is

Both OpenAI's Batch API and Anthropic's Message Batches API allow you to submit a file of requests that are processed asynchronously within 24 hours, at half the price of synchronous API calls.

The infrastructure reason for the discount: batch requests fill idle GPU capacity that would otherwise sit unused during off-peak hours. The provider can defer processing, making the economics favorable enough to pass 50% savings to users.


Pricing: batch vs standard

ModelStandard InputBatch InputStandard OutputBatch Output
GPT-4o$2.50/M$1.25/M$10.00/M$5.00/M
GPT-4.1$2.00/M$1.00/M$8.00/M$4.00/M
GPT-4.1 Mini$0.40/M$0.20/M$1.60/M$0.80/M
Claude Sonnet 4$3.00/M$1.50/M$15.00/M$7.50/M
Claude Haiku 4$0.80/M$0.40/M$4.00/M$2.00/M

Every model, every provider: 50% off.


Which workloads are batch-eligible?

Batch processing works when:

  1. Requests are independent (no dependency on other request responses)
  2. Results can be consumed 1-24 hours after submission
  3. Output is stored for later use, not streamed to a waiting user

Clearly batch-eligible:

  • Nightly document summarization
  • Bulk data enrichment (e.g., categorizing 100K support tickets)
  • Content moderation pipelines
  • Translation of product catalogs
  • Generating embeddings at scale
  • Scheduled report generation
  • Evaluation runs and evals
  • Training data generation
  • SEO content generation pipelines

Clearly NOT batch-eligible:

  • Chat interfaces (user waiting for response)
  • Realtime code completion (IDE plugins)
  • Live customer support agents
  • Any feature where a user is actively waiting

Gray area (needs evaluation):

  • Email drafting suggestions (usually has minutes of tolerance)
  • Background insight generation shown on next page load
  • Pre-generating suggestions for a user's next session


Cost savings example

A content company running 10M tokens/day of document summarization:

Standard API (Claude Sonnet 4):

  • Input: 8M × $3.00/M = $24/day
  • Output: 2M × $15.00/M = $30/day
  • Total: $54/day = $1,620/month

Batch API (Claude Sonnet 4):

  • Input: 8M × $1.50/M = $12/day
  • Output: 2M × $7.50/M = $15/day
  • Total: $27/day = $810/month

Savings: $810/month = $9,720/year with zero quality change.


Implementing the OpenAI Batch API

import openai, json

client = openai.OpenAI()

# Create JSONL batch file
requests = [
    {"custom_id": f"req-{i}", "method": "POST", "url": "/v1/chat/completions",
     "body": {"model": "gpt-4-1", "messages": [{"role": "user", "content": doc}],
              "max_tokens": 500}}
    for i, doc in enumerate(documents)
]

with open("batch.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# Upload file and create batch
batch_file = client.files.create(file=open("batch.jsonl", "rb"), purpose="batch")
batch = client.batches.create(input_file_id=batch_file.id, endpoint="/v1/chat/completions",
                              completion_window="24h")

print(f"Batch ID: {batch.id}")


When batch API doesn't make sense

  1. Your workload is already realtime and latency-sensitive. Don't break a working system for 50% savings if the latency would hurt users.
  2. Batch size is very small (<100 requests). The overhead of creating batch files isn't worth it for small jobs.
  3. You need immediate error feedback. Batch processing errors surface hours later; realtime calls fail immediately so you can retry faster.

The simplest decision rule: if a user isn't waiting for the response, use batch. The 50% discount is free money.

See how to reduce LLM API costs for the full cost optimization playbook, and compare live batch pricing across providers with the LLMversus cost calculator.

Your ad here

Related Tools