Access to Realtime API began rolling out on 10/1, and will be available to all users in the near future. Stay tuned!
The Realtime API allows developers to create low-latency, multi-modal conversational experiences. It currently supports both text and audio as inputs and outputs, as well as function calling capabilities.
Key Benefits:
Native Speech-to-Speech Communication: With no text intermediary, this results in low-latency and nuanced conversational output.
Natural, Steerable Voices: The models feature natural inflection and can adjust tone, laugh, whisper, and follow tonal direction.
Simultaneous Multimodal Output: Text serves as a useful moderation tool, while faster-than-realtime audio ensures smooth and stable playback.