Skip to main content
Universal-3 Pro Streaming uses context carryover to improve transcription accuracy. The model automatically carries prior finalized transcripts forward as context, and you can also pass your voice agent’s spoken reply via the agent_context parameter — either as a connection-time query parameter (to seed the model with your agent’s opening greeting) or mid-stream via UpdateConfiguration after each agent reply. Context helps the model disambiguate words that sound similar and improves entity recognition and consistency. For example, after your agent asks "What's your email address?" the model might transcribe the reply as "user at assemblyai dot com". With agent_context, the model knows an email is coming and produces "user@assemblyai.com".
Transcription context carryover is on by defaultNo configuration is required to use prior transcription context. To layer in your voice agent’s spoken replies, set agent_context at connection time and/or update it mid-stream via UpdateConfiguration — see Passing your agent’s reply as context.

How it works

During a streaming session, Universal-3 Pro Streaming keeps a short memory of recent finalized turns and uses them as additional context when transcribing the next turn. This means:
  • Context is per-session. Closing the WebSocket clears the context — a new session starts fresh.
  • Only agent_context values and finalized turns (end_of_turn: true) are carried forward, not partials.

Defaults

BehaviorDefault
Context carryoverEnabled
Number of prior entries carried3
Maximum context size~1500 characters
Older entries are dropped first as new ones come in, so the most recent conversation is always preserved.

Passing your agent’s reply as context

Universal-3 Pro Streaming automatically carries prior STT-finalized turns (what the user said) back into the model — no configuration required. You can also pass your voice agent’s spoken reply (what your TTS just said) via the agent_context parameter. There are two ways to set it:
  • At connection time — pass agent_context as a query parameter on the WebSocket URL. Use this to seed the model with your agent’s opening greeting before the user has said anything.
  • Mid-stream — send an UpdateConfiguration message with the agent_context field after each subsequent agent reply.
Both forms let the model know the question the user is about to answer, which is especially important for short replies ("yes", "7pm", "that's all").

Setting an opening greeting at connection time

When you open the WebSocket, pass agent_context alongside your other connection parameters. The first user turn will be transcribed with the greeting already in the model’s context.
from urllib.parse import urlencode

params = {
    "agent_context": "Welcome to the Krusty Krab, home of the Krabby Patty, may I take your order?",
    "sample_rate": 16000,
    "speech_model": "u3-rt-pro",
}
url = f"wss://streaming.assemblyai.com/v3/ws?{urlencode(params)}"

Updating agent context mid-stream

A typical voice agent loop looks like this:
  1. User speaks → Universal-3 Pro Streaming emits a final turn.
  2. Your agent runs an LLM step and generates a reply.
  3. Your TTS speaks the reply to the user.
  4. User responds → next turn.
During step 3, send the agent’s reply text to the streaming session so the model knows what question the user will be answering next turn.
ws.send(json.dumps({
    "type": "UpdateConfiguration",
    "agent_context": "Sure — what date would you like to book?",
}))

Limits

  • Universal-3 Pro only. agent_context is supported on speech_model: "u3-rt-pro". If you set it at connection time on any other model, the session is rejected; if you send it mid-stream on another model, it’s stripped with a warning.
  • Per-value cap: ~1500 characters. Trim long agent replies down to the substantive question before sending.

When context carryover helps most

Context carryover has the largest impact on:
  • Voice agents — short user responses to agent questions ("yes", "no", "that's all", dates, times, single names).
  • Spelled-out entities — emails, account IDs, addresses, and similar inputs read aloud after the agent has just asked for them. Setting agent_context to the agent’s prompt (e.g. "What's your email address?") primes the model for what’s coming.
  • Disambiguation — words that sound similar but only one fits the conversation ("fleas" vs "please", "to" vs "two" vs "too").
  • Entity recall — names, products, or terms that were established earlier in the conversation.
It has less impact on long, self-contained turns where the audio already provides enough context on its own.

Interactions with other parameters

  • prompt — Context carryover is layered on top of the default prompt and any custom prompt you provide. You don’t need to manage it yourself.
  • keyterms_prompt — You can use keyterms_prompt alongside context carryover. If you provide a prompt, we recommend dropping keyterms_prompt for that turn and folding domain terms into your prompt instead.
  • Multilingual sessions — Carrying prior turns biases the model toward the languages already seen in the conversation. For sessions that mix three or more languages, this can occasionally push the model toward translating rather than transcribing. If you see drift, set a single transcription language in your prompt (see Specifying the transcription language).

FAQ

No. It’s on by default for every Universal-3 Pro Streaming session. Just keep using the same WebSocket connection across the conversation.
No. Streaming is billed on WebSocket session duration, not on the size of the prompt or the carried context.
No. Context is scoped to a single WebSocket session. If you reconnect, the new session starts with no prior context.
Yes. Set agent_context as a connection-time query parameter to seed the agent’s opening greeting, and/or send it via UpdateConfiguration mid-stream after each subsequent agent reply. The model uses it as context for the next user turn. See Passing your agent’s reply as context.