The main way to control the length of a model response is with the max_tokens
parameter. In the Playground, this setting is the “Maximum length”. The total max length depends on the specific model used for the request.
Give instructions
Providing instructions to generate the desired output length, such as a specific number of items in a list, can influence how long the model response is. This can be done in the user message or system message.
Add examples of a specific length
OpenAI models like GPT-4 are great at recognizing patterns and will consider the length of examples given when generating responses. By providing an example, or multiple examples, with the desired output length, you can give needed context about the expected length.
Strategic Stop Sequences
Another way to control the length out outputs is to use stop sequences. In the example below, the stop sequences are “###” and “6.”. If the model attempts to generate a sixth list item, it will run into the “6.” stop sequence and stop generating text.
Note: There is not currently a way to set a minimum number of tokens.