Rate limit errors ('Too Many Requests', ‘Rate limit reached’) are caused by hitting your organization's rate limit which is the maximum number of requests and tokens that can be submitted per minute. If the limit is reached, the organization cannot successfully submit requests until the rate limit is reset. The error message looks like this:
Rate limit reached for default-code-davinci-002 in organization org-exampleorgid123 on tokens per min.
Limit: 10000.000000 / min. Current: 10020.000000 / min.
Contact support@openai.com if you continue to have issues.
We recommend handling these errors using exponential backoff. Exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached.
As unsuccessful requests contribute to your per-minute limit, continuously resending a request won’t work. Rate limits can be applied over shorter periods - for example, 1 request per second for a 60 RPM limit - meaning short high-volume request bursts can also lead to rate limit errors. Exponential backoff works well by spacing apart requests to minimize the frequency of these errors.
In Python, an exponential backoff solution could look like this:
from openai.error import RateLimitError
import backoff
@backoff.on_exception(backoff.expo, RateLimitError)
def completions_with_backoff(**kwargs):
response = openai.Completion.create(**kwargs)
return response
(Please note: The backoff library is a third-party tool. We encourage all our customers to do their due diligence when it comes to validating any external code for their projects.)
If implementing exponential backoff still results in this error, please fill out the Rate Limit Increase Request. We still get back to you as soon as we can.
To learn more about the default rate limits for each engine type, please see here.