For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
<llms-only>
> For the complete documentation index, see [llms.txt](https://www.assemblyai.com/docs/llms.txt)
</llms-only>
Stream audio and receive real-time transcription results using the Universal-3 Pro Streaming model. The most accurate streaming model for voice agents that demand the highest quality, with best-in-class accuracy and advanced prompting capabilities.
Supports: English, Spanish, German, French, Portuguese, and Italian.
<Note> To use our EU server for Streaming STT, replace `streaming.assemblyai.com` with
`streaming.eu.assemblyai.com`. </Note>
Handshake
WSS
wss://streaming.assemblyai.com/v3/ws
Headers
AuthorizationstringOptional
Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.
Query parameters
speech_modelenumRequired
The speech model to use.
Allowed values:
encodingenumOptionalDefaults to pcm_s16le
Encoding of the audio stream.
Allowed values:
inactivity_timeoutintegerOptional5-3600
Optional time in seconds of inactivity before session is terminated. If not set, no inactivity timeout is applied.
keyterms_promptlist of stringsOptional
A list of words and phrases to improve recognition accuracy for. See [Keyterms Prompting](https://www.assemblyai.com/docs/streaming/keyterms-prompting) for more details.
language_detectionenumOptionalDefaults to false
Whether to return language_code and language_confidence in turn messages. Universal-3 Pro Streaming natively code-switches between English, Spanish, German, French, Portuguese, and Italian by default without any necessary configuration.
Allowed values:
max_turn_silenceintegerOptionalDefaults to 1000
Maximum silence in milliseconds before the turn is forced to end, regardless of punctuation. See [Configuring Turn Detection](https://www.assemblyai.com/docs/streaming/universal-3-pro#configuring-turn-detection) for configuration details.
min_turn_silenceintegerOptionalDefaults to 100
Silence duration in milliseconds before a speculative end-of-turn check. If terminal punctuation is found, the turn ends. Otherwise, a partial is emitted and the turn continues. See [Configuring Turn Detection](https://www.assemblyai.com/docs/streaming/universal-3-pro#configuring-turn-detection) for configuration details.
promptstringOptionalBeta
Prompting is a beta feature. Custom transcription instructions for the model. When not provided, a default prompt optimized for native turn detection is used automatically. See the [Prompting Guide](https://www.assemblyai.com/docs/streaming/universal-3-pro/prompting) for details.
sample_rateintegerOptionalDefaults to 16000
Sample rate of the audio stream.
speaker_labelsenumOptionalDefaults to false
Whether to enable [Streaming Speaker Diarization](https://www.assemblyai.com/docs/streaming/label-speakers-and-separate-channels). When enabled, each Turn event will include a `speaker_label` field and each final word in the `words` array will include a `speaker` field for word-level speaker attribution.
Allowed values:
max_speakersintegerOptional1-10
The maximum number of speakers expected in the audio stream (1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when `speaker_labels` is enabled. See [Streaming Diarization](https://www.assemblyai.com/docs/streaming/label-speakers-and-separate-channels) for more details.
The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection.
continuous_partialsbooleanOptionalDefaults to false
Whether to emit additional partial transcripts during long turns at a steady ~3 second cadence. When disabled (default), only one early partial is emitted near turn start. When enabled, additional partials covering the full turn transcript are emitted approximately every 3 seconds while speech continues. The first partial (at 750ms) is unaffected.
include_partial_turnsbooleanOptionalDefaults to true
Whether to emit partial transcripts during the turn. When enabled (default), partial transcripts are forwarded as speech is still in progress alongside final turns. When disabled, only final turns (with end_of_turn true) are sent. Defaults to false when redact_pii is enabled, to prevent unredacted partial transcripts from reaching the client; set explicitly to true to override.
interruption_delayintegerOptional0-1000Defaults to 500
How soon the first partial is emitted in milliseconds. Useful for tuning voice agent barge-in responsiveness or allowing earlier partials for early LLM inference. Larger values are more confident on interruptions, smaller values result in faster time to first partial.
domainenumOptional
Enable domain-specific transcription models to improve accuracy for specialized terminology. Set to `"medical-v1"` to enable [Medical Mode](https://www.assemblyai.com/docs/streaming/medical-mode) for improved accuracy of medical terms such as medications, procedures, conditions, and dosages. Supported languages: English (`en`), Spanish (`es`), German (`de`), French (`fr`). If used with an unsupported language, the parameter is ignored and a warning is returned.
Allowed values:
filter_profanityenumOptionalDefaults to false
Filter profanity from the transcribed text, can be true or false. See [Profanity Filtering](https://www.assemblyai.com/docs/streaming/filter-profanity-from-transcripts) for more details.
Allowed values:
redact_piienumOptionalDefaults to false
Redact PII from the transcribed text using the Redact PII model, can be true or false. Only applies to final turns. See [PII Redaction](https://www.assemblyai.com/docs/streaming/pii-redaction) for more details.
Allowed values:
redact_pii_policieslist of enumsOptional
The list of PII Redaction policies to enable. Requires `redact_pii` to be `true`. See [PII redaction](https://www.assemblyai.com/docs/streaming/pii-redaction) for more details.
redact_pii_subenumOptional
The replacement logic for detected PII, can be `entity_name` or `hash`. Requires `redact_pii` to be `true`. See [PII redaction](https://www.assemblyai.com/docs/streaming/pii-redaction) for more details.
Allowed values:
llm_gatewaystringRequired
JSON-stringified LLM Gateway configuration that processes each finalized turn. Follows the same interface as the [Chat Completions](/docs/llm-gateway/chat-completions) endpoint and accepts `model`, `messages`, `tools`, `tool_choice`, `post_processing_steps`, and `max_tokens`. See [Apply LLM Gateway to Streaming](https://www.assemblyai.com/docs/llm-gateway/apply-llm-gateway-to-streaming) for the full schema and examples.
Send
sendAudiostringRequiredformat: "binary"
Send audio data chunks for transcription. The payload must be of type bytes and contain audio data between 50ms and 1000ms in length. When streaming from a pre-recorded file, pace the chunks at approximately real-time (for example, sleep for the chunk's duration between sends) — sending chunks in a tight loop can produce inconsistent Turn messages. See the [Universal-3 Pro Streaming quickstart](https://www.assemblyai.com/docs/streaming/universal-3-pro) to get started.
OR
sendUpdateConfigurationobjectRequired
Update streaming configuration parameters during an active session. You can update prompt, keyterms_prompt, min_turn_silence, max_turn_silence, continuous_partials, vad_threshold, and interruption_delay.
OR
sendForceEndpointobjectRequired
Manually force an endpoint in the transcription.
OR
sendSessionTerminationobjectRequired
Gracefully terminate the streaming session.
OR
sendKeepAliveobjectRequired
Send a keep-alive message to reset the `inactivity_timeout` timer. This is not necessary by default — sessions remain open until explicitly terminated or until the 3-hour maximum session duration is reached. This message is only needed if you have set `inactivity_timeout` and want to keep the session open during periods where no audio is being sent.
Receive
receiveSessionBeginsobjectRequired
Receive confirmation that the streaming session has successfully started.
OR
receiveSpeechStartedobjectRequired
Receive a notification that speech has been detected. This event is only emitted when the model produces a transcript. Every SpeechStarted is guaranteed to be followed by one or more Turn messages.
OR
receiveTurnobjectRequired
Receive a formatted turn-based transcription result.
OR
receiveTerminationobjectRequired
Receive confirmation that the session has been terminated by the server.
OR
receiveLLMGatewayResponseobjectRequired
Receive an LLM Gateway response for a finalized turn. Emitted once per turn when llm_gateway is configured on the connection.
Stream audio and receive real-time transcription results using the Universal-3 Pro Streaming model. The most accurate streaming model for voice agents that demand the highest quality, with best-in-class accuracy and advanced prompting capabilities.
Supports: English, Spanish, German, French, Portuguese, and Italian.
To use our EU server for Streaming STT, replace streaming.assemblyai.com with
streaming.eu.assemblyai.com.
A list of words and phrases to improve recognition accuracy for. See Keyterms Prompting for more details.
Maximum silence in milliseconds before the turn is forced to end, regardless of punctuation. See Configuring Turn Detection for configuration details.
Silence duration in milliseconds before a speculative end-of-turn check. If terminal punctuation is found, the turn ends. Otherwise, a partial is emitted and the turn continues. See Configuring Turn Detection for configuration details.
Prompting is a beta feature. Custom transcription instructions for the model. When not provided, a default prompt optimized for native turn detection is used automatically. See the Prompting Guide for details.
Whether to enable Streaming Speaker Diarization. When enabled, each Turn event will include a speaker_label field and each final word in the words array will include a speaker field for word-level speaker attribution.
The maximum number of speakers expected in the audio stream (1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when speaker_labels is enabled. See Streaming Diarization for more details.
Whether to emit partial transcripts during the turn. When enabled (default), partial transcripts are forwarded as speech is still in progress alongside final turns. When disabled, only final turns (with end_of_turn true) are sent. Defaults to false when redact_pii is enabled, to prevent unredacted partial transcripts from reaching the client; set explicitly to true to override.
Enable domain-specific transcription models to improve accuracy for specialized terminology. Set to "medical-v1" to enable Medical Mode for improved accuracy of medical terms such as medications, procedures, conditions, and dosages. Supported languages: English (en), Spanish (es), German (de), French (fr). If used with an unsupported language, the parameter is ignored and a warning is returned.
Filter profanity from the transcribed text, can be true or false. See Profanity Filtering for more details.
Redact PII from the transcribed text using the Redact PII model, can be true or false. Only applies to final turns. See PII Redaction for more details.
The list of PII Redaction policies to enable. Requires redact_pii to be true. See PII redaction for more details.
The replacement logic for detected PII, can be entity_name or hash. Requires redact_pii to be true. See PII redaction for more details.
JSON-stringified LLM Gateway configuration that processes each finalized turn. Follows the same interface as the Chat Completions endpoint and accepts model, messages, tools, tool_choice, post_processing_steps, and max_tokens. See Apply LLM Gateway to Streaming for the full schema and examples.
Send audio data chunks for transcription. The payload must be of type bytes and contain audio data between 50ms and 1000ms in length. When streaming from a pre-recorded file, pace the chunks at approximately real-time (for example, sleep for the chunk’s duration between sends) — sending chunks in a tight loop can produce inconsistent Turn messages. See the Universal-3 Pro Streaming quickstart to get started.
Send a keep-alive message to reset the inactivity_timeout timer. This is not necessary by default — sessions remain open until explicitly terminated or until the 3-hour maximum session duration is reached. This message is only needed if you have set inactivity_timeout and want to keep the session open during periods where no audio is being sent.