Universal-3 Pro Streaming
Handshake
Headers
Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.
Query parameters
A list of words and phrases to improve recognition accuracy for. See Keyterms Prompting for more details.
Whether to return language_code and language_confidence in turn messages. Universal-3 Pro Streaming natively code-switches between English, Spanish, German, French, Portuguese, and Italian by default without any necessary configuration.
Maximum silence in milliseconds before the turn is forced to end, regardless of punctuation. See Configuring Turn Detection for configuration details.
Silence duration in milliseconds before a speculative end-of-turn check. If terminal punctuation is found, the turn ends. Otherwise, a partial is emitted and the turn continues. See Configuring Turn Detection for configuration details.
Prompting is a beta feature. Custom transcription instructions for the model. When not provided, a default prompt optimized for native turn detection is used automatically. See the Prompting Guide for details.
Whether to enable Streaming Speaker Diarization. When enabled, each Turn event will include a speaker_label field indicating the speaker.
The maximum number of speakers expected in the audio stream (1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when speaker_labels is enabled. See Streaming Diarization for more details.
API token for authentication (if using a temporary token).
The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection.
Send
Send audio data chunks for transcription. The payload must be of type bytes and contain audio data between 50ms and 1000ms in length. See the Universal-3 Pro Streaming quickstart to get started.
Update streaming configuration parameters during an active session. You can update prompt, keyterms_prompt, min_turn_silence, and max_turn_silence.
Receive
Receive a formatted turn-based transcription result.