Realtime Voice Intelligence API Launch
Key Questions
When were the new realtime voice models launched?
The models launched on May 7. They include GPT-Realtime-2, Realtime-Translate, and Realtime-Whisper for low-latency applications.
What is GPT-Realtime-2?
GPT-Realtime-2 uses GPT-5 reasoning, 128K context, and tools for complex real-time conversations. It builds on GPT-5.5 multimodal for voice agents.
What does Realtime-Translate support?
Realtime-Translate supports 70+ input languages into 13 outputs for live speech translation. It enables real-time multilingual voice apps.
What is Realtime-Whisper?
Realtime-Whisper provides streaming speech-to-text (STT). It enhances real-time voice tasks for enterprise and apps.
Who are the new voice models for?
They target developers building voice agents, enterprise solutions, and apps. They improve on previous audio models for lower latency.
How do these models integrate with existing tech?
They work with end-of-utterance detection, audio-to-text conversion, LLM calls, and response generation. Available now in the API.
What makes these models GPT-5-class?
GPT-Realtime-2 brings GPT-5-level reasoning to voice. The trio supports advanced real-time tasks like reasoning and translation.
Are the new audio models available in the API?
Yes, they are available for speech-to-text and text-to-speech. Developers can build more powerful, customizable voice experiences.
New models May 7: GPT-Realtime-2 (GPT-5 reasoning/128K ctx/tools for complex real-time convos), Realtime-Translate (70+ input langs), Realtime-Whisper (streaming STT); for voice agents/enterprise/apps, builds on GPT-5.5 multimodal.