The speech_model connection parameter lets you specify which model to use for streaming transcription.
You must include the speech_model parameter in every streaming transcription request. There is no default model. If you omit speech_model, the request will fail.
We recommend Universal-3 Pro Streaming as your primary model for streaming transcription. It provides the highest accuracy with sub-300ms latency, native multilingual code switching, and advanced prompting support — ideal for voice agents and real-time applications.
All streaming models are billed on the total duration that your WebSocket connection stays open, not on the amount of audio you send. Always send a Terminate message when you’re done with a stream — sessions that aren’t closed auto-close after 3 hours and are billed for the full duration. See Billing and pricing for details.
For detailed setup and configuration of Universal-3 Pro streaming, see the Universal-3 Pro Streaming page. For prompting guidance, see the Prompting guide.
For detailed setup and configuration of Whisper streaming, see this page.
You can select a model by setting the speech_model connection parameter when connecting to the streaming API: