Whisper API FAQ

General questions about the speech to text API

What is ASR?

ASR stands for Automatic Speech Recognition, which is the technology used to convert spoken language into written text. OpenAI's ASR models have the potential to be used in a wide range of applications, from transcription services to voice assistants and more.

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

So the Whisper ASR API is the API for our Whisper ASR system.

How much does the Whisper ASR API cost to use?

See our Pricing page for details.

Is Whisper still free in the playground?

Starting March 1st 2023 with the Whisper API launch it is no longer free in the playground.

What languages are supported?

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

What is the rate limit?

50 requests per minute.

How should users expect our rate limits take into account the length of the audio?

Any length of audio is fine as long the size of the input file is less than 25MB.

How is the transcription received - in a streaming style?

It is not streamed.

What file size is supported?

Up to 25MB.

What file formats are supported?

  • m4a

  • mp3

  • webm

  • mp4

  • mpga

  • wav

  • mpeg

Can I send links to audio files instead?

We don't support this right now. You'll need to send a file in one of the formats listed above.

