An introduction to rate limits
Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.
Rate limits can be quantized, meaning they are enforced over shorter periods of time (e.g. 60,000 requests/minute may be enforced as 1,000 requests/second). Sending short bursts of requests or contexts (prompts+max_tokens) that are too long can lead to rate limit errors, even when you are technically below the rate limit per minute.
Best practices for preventing rate limit errors
Default org
If you belong to multiple orgs with different billing plans and usage tiers, make sure your default organization is set to the appropriate org to control which organization is used by default when making requests with your API keys.
Exponential backoff
Include exponential backoff logic in your code. This will catch and retry failed requests.
Token limits
Reduce the max_tokens to match the size of your completions. Usage needs are estimated from this value, so reducing it will decrease the chance that you unexpectedly receive a rate limit error. For example, if your prompt creates completions around 400 tokens, the max_tokens value should be around the same size.
Optimize your prompts. You can do this by making your instructions shorter, removing extra words, and getting rid of extra examples. You might need to work on your prompt and test it after these changes to make sure it still works well. The added benefit of a shorter prompt is reduced cost to you. If you need help, let us know.
Usage tier
If you've implemented these best practices but still facing rate limit errors, you can increase your rate limits by increasing your usage tier. You can view your current rate limits, your current usage tier, and how to raise your usage tier/limits in the Limits section of your account settings.
Further reading
Review our comprehensive documentation on usage tiers and rate limits聽here.