Skip to main content

Controlling the length of OpenAI model responses

Learn how to set output limits for OpenAI models using token settings, clear prompts, examples, and stop sequences.

Updated over 3 weeks ago

You can control the length of a model’s output using several techniques depending on your goals and the model you're working with.

Set a Maximum Token Limit

Use the max_completion_tokens parameter to limit how many tokens the model will generate.

  • Playground: This is labeled as "Maximum Length".

  • API:

    • For reasoning models like o3, o4-mini, and gpt-4.1, use max_completion_tokens.

    • For earlier models, max_tokens still works and behaves the same as before.

Important:

  • max_tokens is deprecated for newer reasoning models.

  • You can review the model reference page for the token behavior of specific models.

  • Note: There is not currently a way to set a minimum number of tokens.


Provide Specific Instructions

Clearly ask for a desired output length in your prompt. For example:

  • “List exactly five options.”

  • “Write a summary in 50 words.”

Simple prompt guidance often works very well with GPT and reasoning models.


Use Examples with Consistent Length

Providing one or more examples of outputs that match the length you want will guide the model to continue the pattern.

This technique leverages the model’s strong few-shot generalization abilities.


Apply Strategic Stop Sequences

You can use the stop parameter to end a generation early when the model outputs certain strings.

Example:

jsonCopyEdit"stop": ["###", "6."]

This would cause the model to halt if it tries to start generating a sixth list item.


Control the Number of Completions

The n parameter controls how many completions the model generates at once.
Each generated completion respects your max token or stop sequence settings.

Note: Be careful - using n can quickly increase your token usage and costs.

Did this answer your question?