You can control the length of a model’s output using several techniques depending on your goals and the model you're working with.
Set a Maximum Token Limit
Use the max_completion_tokens
parameter to limit how many tokens the model will generate.
Playground: This is labeled as "Maximum Length".
API:
For reasoning models like
o3
,o4-mini
, andgpt-4.1
, usemax_completion_tokens
.For earlier models,
max_tokens
still works and behaves the same as before.
Important:
max_tokens
is deprecated for newer reasoning models.You can review the model reference page for the token behavior of specific models.
Note: There is not currently a way to set a minimum number of tokens.
Provide Specific Instructions
Clearly ask for a desired output length in your prompt. For example:
“List exactly five options.”
“Write a summary in 50 words.”
Simple prompt guidance often works very well with GPT and reasoning models.
Use Examples with Consistent Length
Providing one or more examples of outputs that match the length you want will guide the model to continue the pattern.
This technique leverages the model’s strong few-shot generalization abilities.
Apply Strategic Stop Sequences
You can use the stop
parameter to end a generation early when the model outputs certain strings.
Example:
jsonCopyEdit"stop": ["###", "6."]
This would cause the model to halt if it tries to start generating a sixth list item.
Control the Number of Completions
The n
parameter controls how many completions the model generates at once.
Each generated completion respects your max token or stop sequence settings.
Note: Be careful - using n
can quickly increase your token usage and costs.