Skip to main content
Voice chat FAQ

Your guide to voice conversations with ChatGPT, from setting up and using the feature to understanding its capabilities and limitations.

Updated this week

When can I try GPT-4o Real-Time Voice Mode?

GPT-4o real-time voice and vision will be rolling out to a limited Alpha for ChatGPT Plus users in a few weeks. It will be widely available for ChatGPT Plus users over the coming months.

Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

What are voice chats?

Try a new way of interacting with ChatGPT: talk, don’t type – and it’ll respond in a natural voice.

Our voice capability is powered by our models, including Whisper, our open source speech-to-text model, and a new text-to-speech model.

Enable Voice conversations to engage in back-and-forth voice conversations with ChatGPT.

Which plan types can have voice chats?

All ChatGPT users have access to voice chats through our mobile app.

Which apps can have voice chats?

Voice conversations are available on the ChatGPT mobile apps for both iOS and Android.

How many voice options are available?

Choose from five lifelike output voices for ChatGPT, each with its own distinct tone and character, including: Breeze, Cove, Juniper, and Ember.

We are working to pause the use of Sky.

Do GPTs use one of the five options of voices in ChatGPT?

No, GPTs have their own voice option named, Shimmer, that is distinctly different from the 5 output voices available to use when having voice conversations with ChatGPT.

Which models can I use in voice conversations?

GPT-4o and GPT-4 are available for use in voice conversations.

Keep in mind that GPT-4 has message limits for Plus and Team plans. For users on the Enterprise plan there is no message cap.

Is there a volume limit I can set for voice conversations?

No, there is not a volume limit for voice conversations as a setting in ChatGPT. Volume will be set on the device itself.

Can I use ChatGPT vision capabilities and voice conversations in the same conversation?

Yes, you can start a voice conversation in a chat using vision capabilities just like you can start a voice conversation in conversations using GPT-4o or GPT-4.

Why does the banner include thumbs up / down rating after my voice conversation has ended?

All users having voice conversations will see a banner after their voice conversation has ended. This feedback survey collects information on the experience of the voice call, not about the conversation or its contents.

Only users on Plus will see the options to rate with the thumbs up/down included in that banner.

While Enterprise users will see the banner about the voice conversation ending their banner should not include the rating options thumbs up / down.

Are voice conversations hands-free?

Once you enter a voice conversation it is hands free until you exit the voice conversation. There are manual controls which allow you to pause, resume, and exit the voice conversation.

Do voice conversations include subtitles?

No subtitles are not included or displayed during a voice conversation. After you exit a voice conversation the transcription is added to your current text based conversation with ChatGPT.

Enable the ability to have voice conversations

Settings → App → New Features → Voice conversations (toggle on)

Disable the ability to have voice conversations

Settings → App → New Features → Voice conversations (toggle off)

Start a voice conversation

To start a voice conversation, tap the headphones icon. Once the connection is established, ChatGPT will be listening for you to speak.

Pause the voice conversation

Tap the pause icon.

Interrupt the voice conversation

While ChatGPT is talking you can either Tap to interrupt:

Or you can tap the stop icon.

Resume the voice conversation

Tap the resume icon, and start speaking again.

Unmute the voice conversation

Tap to unmute.

Exit the voice conversation

To exit Voice Mode tap the X icon to end the voice conversation and return to the text based conversation with ChatGPT.

How long can I leave a voice conversation paused for?

No limit.

How many voice conversations can I have going at once?

You will stay in your current conversation until you start a new conversation or switch to another existing conversation.

Why am I receiving the response "Sorry, I cannot help with that"?

This happens due to our safety measures. If it seems like your prompt is in line with our Usage Policies then please send us that feedback through the thumbs up/thumbs down options in the chat.

Why does the voice input detect a different language from the one I’m speaking?

At times, the language you speak might not be accurately reflected in our voice input feature. You can specify a preferred language in Settings for a more accurate detection.

  1. Click on the "..." button on the top right hand corner, and then click on the "Settings" button.

2. Within the Settings page, scroll down to the Speech section. Click on the "Main Language" dropdown to select your language.

Privacy & Controls

How long do you retain audio from my voice chats?

Voice chat works by sending audio clips from ChatGPT to our Whisper API for transcription. We delete audio clips once transcription is complete, unless you’ve chosen to share your audio to improve voice chats for everyone. Learn more about sharing your audio to improve our models.

Do you train your models on audio clips from voice chats?

Nope, unless you choose to share your audio for us to improve voice chats for everyone. If you share your audio with us, then we may use audio from your voice chats to train our models.

Transcribed chats may be used to train our models depending on your choices and plan. Learn more about your choices.

Sharing audio to improve voice chats for everyone

We’ve begun to invite a subset of our users to share audio from their voice chats to help us improve our voice models. This section provides more information on what sharing your audio means.

Who can share audio to improve voice chats?

We are currently inviting a subset of ChatGPT users on free and Plus plans to share audio from their voice chats to improve our models. These users can share audio only from personal workspaces. Users cannot share audio from voice chats in ChatGPT Team and Enterprise workspaces.

What happens if I share my audio to improve chats for everyone?

If you choose to share your audio, then we will store audio from your voice chats rather than deleting audio clips once transcription is complete. We will take steps to reduce the amount of personal information in audio from voice chat that is used to train our models. Our team may review the audio that you’ve shared with us.

How can I stop sharing audio?

You can stop sharing through the data controls page in your ChatGPT mobile app settings. Just toggle the “Improve voice for everyone” button to off.

Don’t see the toggle? Then you either haven’t yet been invited to share your audio or are using an outdated ChatGPT mobile app.

What happens if I decide to stop sharing my audio?

If you choose to stop sharing, then audio clips from future voice chats will be deleted once transcription is complete.

For audio that you previously shared with us, we will delete the raw audio that is associated with your account within 30 days. Audio clips that were previously disassociated from your account may continue to be used to improve our voice models. Prior to using audio clips from voice chats to improve our models, we take steps to reduce the amount of personal information in the audio clip.

Is my choice to share audio to improve voice chats for everyone a device-specific setting?

Your choice to share audio to improve voice chats for everyone is tied to your account. If you choose to share audio from your voice chats, then that choice will also apply to other devices where you are logged in. You can stop sharing audio to improve voice chats through the data control page in the ChatGPT mobile app.

Did this answer your question?