Universal-3 Pro Streaming

Stream audio and receive real-time transcription results using the Universal-3 Pro Streaming model. The most accurate streaming model for voice agents that demand the highest quality, with best-in-class accuracy and advanced prompting capabilities. Supports: English, Spanish, German, French, Portuguese, and Italian. <Note> To use our EU server for Streaming STT, replace `streaming.assemblyai.com` with `streaming.eu.assemblyai.com`. </Note>

Handshake

WSS
wss://streaming.assemblyai.com/v3/ws

Headers

AuthorizationstringOptional

Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.

Query parameters

speech_modelenumRequired
The speech model to use.
Allowed values:
encodingenumOptionalDefaults to pcm_s16le
Encoding of the audio stream.
Allowed values:
inactivity_timeoutintegerOptional5-3600
Optional time in seconds of inactivity before session is terminated. If not set, no inactivity timeout is applied.
keyterms_promptlist of stringsOptional

A list of words and phrases to improve recognition accuracy for. See Keyterms Prompting for more details.

language_detectionenumOptionalDefaults to false

Whether to return language_code and language_confidence in turn messages. Universal-3 Pro Streaming natively code-switches between English, Spanish, German, French, Portuguese, and Italian by default without any necessary configuration.

Allowed values:
max_turn_silenceintegerOptionalDefaults to 1000

Maximum silence in milliseconds before the turn is forced to end, regardless of punctuation. See Configuring Turn Detection for configuration details.

min_turn_silenceintegerOptionalDefaults to 100

Silence duration in milliseconds before a speculative end-of-turn check. If terminal punctuation is found, the turn ends. Otherwise, a partial is emitted and the turn continues. See Configuring Turn Detection for configuration details.

promptstringOptionalBeta

Prompting is a beta feature. Custom transcription instructions for the model. When not provided, a default prompt optimized for native turn detection is used automatically. See the Prompting Guide for details.

sample_rateintegerRequiredDefaults to 16000
Sample rate of the audio stream.
speaker_labelsenumOptionalDefaults to false

Whether to enable Streaming Speaker Diarization. When enabled, each Turn event will include a speaker_label field indicating the speaker.

Allowed values:
max_speakersintegerOptional1-10

The maximum number of speakers expected in the audio stream (1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when speaker_labels is enabled. See Streaming Diarization for more details.

tokenstringOptional

API token for authentication (if using a temporary token).

vad_thresholddoubleOptionalDefaults to 0.3

The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection.

domainenumRequired
Enable domain-specific transcription models to improve accuracy for specialized terminology. Set to `"medical-v1"` to enable [Medical Mode](https://www.assemblyai.com/docs/streaming/medical-mode) for improved accuracy of medical terms such as medications, procedures, conditions, and dosages. Supported languages: English (`en`), Spanish (`es`), German (`de`), French (`fr`). If used with an unsupported language, the parameter is ignored and a warning is returned.
Allowed values:

Send

sendAudiostringRequiredformat: "binary"

Send audio data chunks for transcription. The payload must be of type bytes and contain audio data between 50ms and 1000ms in length. See the Universal-3 Pro Streaming quickstart to get started.

OR
sendUpdateConfigurationobjectRequired

Update streaming configuration parameters during an active session. You can update prompt, keyterms_prompt, min_turn_silence, and max_turn_silence.

OR
sendForceEndpointobjectRequired
Manually force an endpoint in the transcription.
OR
sendSessionTerminationobjectRequired
Gracefully terminate the streaming session.

Receive

receiveSessionBeginsobjectRequired
Receive confirmation that the streaming session has successfully started.
OR
receiveSpeechStartedobjectRequired
Receive a notification that speech has been detected in the audio stream.
OR
receiveTurnobjectRequired

Receive a formatted turn-based transcription result.

OR
receiveTerminationobjectRequired
Receive confirmation that the session has been terminated by the server.