Rate limit errors ('Too Many Requests', ‘Rate limit reached’) are caused by hitting your organization's rate limit which is the maximum number of requests and tokens that can be submitted per minute. If the limit is reached, the organization cannot successfully submit requests until the rate limit is reset.

We recommend handling these errors using exponential backoff. Exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached.

As unsuccessful requests contribute to your per-minute limit, continuously resending a request won’t work. Rate limits can be applied over shorter periods - for example, 1 request per second for a 60 RPM limit - meaning short high-volume request bursts can also lead to rate limit errors. Exponential backoff works well by spacing apart requests to minimize the frequency of these errors.

In Python, an exponential backoff solution could look like this:

from openai.error import RateLimitError
import backoff

@backoff.on_exception(backoff.expo, RateLimitError)
def completions_with_backoff(**kwargs):
response = openai.Completion.create(**kwargs)
return response

(Please note: The backoff library is a third-party tool. We encourage all our customers to do their due diligence when it comes to validating any external code for their projects.)


To learn more about the default rate limits for each engine type, please see here.

Did this answer your question?