Universal-Streaming | AssemblyAI

Stream audio and receive real-time transcription results. Fast, cost-effective streaming transcription available in three variants:

Universal-Streaming English — the fastest real-time English transcription
Universal-Streaming Multilingual — multilingual support (English, Spanish, German, French, Portuguese, and Italian) at the same speed and price
Whisper-Streaming — open-source Whisper powered by AssemblyAI’s infrastructure with 99+ languages

To use our EU server for Streaming STT, replace streaming.assemblyai.com with streaming.eu.assemblyai.com.

Stream audio and receive real-time transcription results. Fast, cost-effective streaming transcription available in three variants: - **Universal-Streaming English** — the fastest real-time English transcription - **Universal-Streaming Multilingual** — multilingual support (English, Spanish, German, French, Portuguese, and Italian) at the same speed and price - **Whisper-Streaming** — open-source Whisper powered by AssemblyAI's infrastructure with 99+ languages <Note> To use our EU server for Streaming STT, replace `streaming.assemblyai.com` with `streaming.eu.assemblyai.com`. </Note>

Handshake

WSS

wss://streaming.assemblyai.com/v3/ws

Headers

AuthorizationstringOptional

Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.

Query parameters

speech_modelenumRequired

The speech model used for your Streaming session.

Allowed values:

encodingenumOptionalDefaults to pcm_s16le

Encoding of the audio stream.

Allowed values:

format_turnsenumOptionalDefaults to false

Whether to return formatted final transcripts.

Allowed values:

inactivity_timeoutintegerOptional5-3600

Optional time in seconds of inactivity before session is terminated. If not set, no inactivity timeout is applied.

keyterms_promptlist of stringsOptional

A list of words and phrases to improve recognition accuracy for. See Keyterms Prompting for more details.

language_detectionenumOptionalDefaults to false

Whether to detect the language and return language metadata on utterances and final turns. Only available for the multilingual model.

Allowed values:

max_turn_silenceintegerOptionalDefaults to 1280

The maximum amount of silence in milliseconds allowed in a turn before end of turn is triggered. See Turn Detection for configuration details.

min_turn_silenceintegerOptionalDefaults to 400

The minimum amount of silence in milliseconds required to detect end of turn when confident. See Turn Detection for configuration details.

sample_rateintegerRequiredDefaults to 16000

Sample rate of the audio stream.

speaker_labelsenumOptionalDefaults to false

Whether to enable Streaming Speaker Diarization. When enabled, each Turn event will include a speaker_label field indicating the speaker.

Allowed values:

max_speakersintegerOptional1-10

The maximum number of speakers expected in the audio stream (1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when speaker_labels is enabled. See Streaming Diarization for more details.

tokenstringOptional

API token for authentication (if using a temporary token).

vad_thresholddoubleOptionalDefaults to 0.4

The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection.

end_of_turn_confidence_thresholddoubleOptionalDefaults to 0.4

The confidence threshold (0.0 to 1.0) to use when determining if the end of a turn has been reached. See Turn Detection for configuration details.

Note: This parameter is only supported for the Universal-streaming model.

domainenumRequired

Enable domain-specific transcription models to improve accuracy for specialized terminology. Set to "medical-v1" to enable Medical Mode for improved accuracy of medical terms such as medications, procedures, conditions, and dosages. Supported languages: English (en), Spanish (es), German (de), French (fr). If used with an unsupported language, the parameter is ignored and a warning is returned.

Enable domain-specific transcription models to improve accuracy for specialized terminology. Set to `"medical-v1"` to enable [Medical Mode](https://www.assemblyai.com/docs/streaming/medical-mode) for improved accuracy of medical terms such as medications, procedures, conditions, and dosages. Supported languages: English (`en`), Spanish (`es`), German (`de`), French (`fr`). If used with an unsupported language, the parameter is ignored and a warning is returned.

Allowed values:

languageenumOptionalDefaults to enDeprecated

The language of your audio stream.

Allowed values:

Send

sendAudiostringRequiredformat: "binary"

Send audio data chunks for transcription. The payload must be of type bytes and contain audio data between 50ms and 1000ms in length.

sendUpdateConfigurationobjectRequired

Update streaming configuration parameters during an active session.

sendForceEndpointobjectRequired

Manually force an endpoint in the transcription.

sendSessionTerminationobjectRequired

Gracefully terminate the streaming session.

Receive

receiveSessionBeginsobjectRequired

Receive confirmation that the streaming session has successfully started.

receiveTurnobjectRequired

Receive a formatted turn-based transcription result.

receiveTerminationobjectRequired

Receive confirmation that the session has been terminated by the server.

URL	wss://streaming.assemblyai.com/v3/ws
Method	GET
Status	101 Switching Protocols

HandshakeTry it

Headers

Query parameters

Send

Receive

Handshake