I'm getting rate limit errors, but I think I'm under my rate limit. What's going on?

An introduction to rate limits

Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.

Rate limits can be quantized, meaning they are enforced over shorter periods of time (e.g. 60,000 requests/minute may be enforced as 1,000 requests/second). Sending short bursts of requests or contexts (prompts+max_completion_tokens) that are too long can lead to rate limit errors, even when you are technically below the rate limit per minute.

Best practices for preventing rate limit errors

Default org

If you belong to multiple orgs with different billing plans and usage tiers, make sure your default organization is set to the appropriate org to control which organization is used by default when making requests with your API keys.

Exponential backoff

Include exponential backoff logic in your code. This will catch and retry failed requests.

Token limits

Reduce the max_completion_tokens to match the size of your completions. Usage needs are estimated from this value, so reducing it will decrease the chance that you unexpectedly receive a rate limit error. For example, if your prompt creates completions around 400 tokens, the max_tokens value should be around the same size.

Optimize your prompts. You can do this by making your instructions shorter, removing extra words, and getting rid of extra examples. You might need to work on your prompt and test it after these changes to make sure it still works well. The added benefit of a shorter prompt is reduced cost to you. If you need help, let us know.

Usage tier

If you've implemented these best practices but still facing rate limit errors, you can increase your rate limits by increasing your usage tier. You can view your current rate limits, your current usage tier, and how to raise your usage tier/limits in the Limits section of your account settings.

What are the best practices for managing my rate limits in the API?

An introduction to rate limits

Best practices for preventing rate limit errors

Default org

Exponential backoff

Token limits

Usage tier

Further reading

Was this article helpful?