Pricing & Cost

Tokens Per Minute (TPM)

Quick Answer

A rate limit measuring maximum tokens processed per minute.

TPM is a common rate limit unit. It's more relevant than request count for throughput-heavy applications. Managing TPM requires batching or queueing. Exceeding TPM limits requests. TPM limits enable fair-use and capacity planning. Different API tiers have different TPM limits. High-volume applications should plan around TPM limits.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →