Fundamentals

Max Tokens

Quick Answer

The maximum number of tokens the model will generate in a completion.

Max tokens sets an upper limit on the length of the model's output. If you set max_tokens=500, the model will stop generating after 500 tokens regardless of whether the response is complete. This parameter is crucial for controlling costs (longer outputs = higher API bills), managing latency, and preventing runaway outputs. However, setting max_tokens too low can cause incomplete answers. You must balance thoroughness against cost and speed.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms