Pricing & Cost

API Rate Limits

Quick Answer

Restrictions on how many requests or tokens can be processed per time unit.

Rate limits restrict request frequency or token throughput. Common limits: requests-per-minute, tokens-per-minute. Rate limits prevent abuse and manage capacity. Exceeding limits results in 429 errors (retry later). Rate limits vary by subscription tier. Understanding limits is crucial for production systems. Most APIs have generous limits for paid tiers. Rate limit errors require proper handling (exponential backoff).

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms