> ## Documentation Index
> Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Universal 3.5 Pro Realtime on LiveKit

export const ModelBadges = ({models}) => {
  return <div className="flex flex-wrap gap-2 -mt-3 mb-3 not-prose">
      {models.map(model => <span key={model} className="inline-flex items-center rounded-full bg-green-500/15 px-2.5 py-0.5 text-xs font-mono text-green-700 dark:text-green-400 ring-1 ring-inset ring-green-500/30">
          {model}
        </span>)}
    </div>;
};

## Overview

<ModelBadges models={["universal-3-5-pro", "u3-rt-pro"]} />

This guide covers integrating AssemblyAI's **Universal 3.5 Pro Realtime** speech-to-text model into a LiveKit voice agent using the [Agents framework](https://docs.livekit.io/agents/). Everything here applies equally to **Universal-3 Pro Streaming** (`u3-rt-pro`) — both belong to the same U3 Pro family and share every parameter in this guide, so you can swap the `model` string without changing anything else.

<Note>
  **Universal 3.5 Pro Realtime is our flagship next-generation streaming model for voice agents** — multilingual and promptable, with [conversation context](#conversation-context) and [voice focus](#voice-focus).

  Available on **`livekit-agents` 1.6+** — set `model="universal-3-5-pro"`.
</Note>

AssemblyAI provides the speech-to-text and turn detection in your LiveKit pipeline:

```mermaid theme={null}
flowchart LR
  U["User audio"] --> STT["AssemblyAI STT<br/>Universal 3.5 Pro Realtime"]
  STT --> TD["Turn detection"]
  TD --> LLM["LLM"]
  LLM --> TTS["TTS"]
  TTS --> U
```

Once you have an agent running, tune it for what matters most to your use case:

<CardGroup cols={2}>
  <Card title="Turn detection" icon="comments" href="#turn-detection">
    Decide when the user is done speaking — modes, defaults, and entity tuning.
  </Card>

  <Card title="Latency" icon="gauge-high" href="#latency">
    Shorten the gap between the user finishing and the agent replying.
  </Card>

  <Card title="Accuracy" icon="bullseye" href="#accuracy">
    Prompting, key terms, conversation context, and noise handling.
  </Card>

  <Card title="Interruptions" icon="hand" href="#interruption-handling">
    Natural barge-in without false triggers from backchannels.
  </Card>
</CardGroup>

## Quickstart

Get a working, talking agent in a few minutes, then optimize from there.

<Steps>
  <Step title="Install the SDK">
    Install the plugin and supporting packages (silero, codecs, dotenv) from PyPI:

    ```bash theme={null}
    pip install "livekit-agents[assemblyai,silero,codecs]~=1.6" \
        python-dotenv
    ```

    If you plan to use LiveKit turn detection with `MultilingualModel()`, also install the turn detector plugin:

    ```bash theme={null}
    pip install "livekit-plugins-turn-detector~=1.0"
    ```

    <Warning>
      Install the latest `livekit-agents` from PyPI. **Universal 3.5 Pro Realtime**, `agent_context`, and **Voice Focus** require a recent **1.6** release; the `u3-rt-pro` model has been supported since [livekit-agents@1.4.4](https://github.com/livekit/agents/releases/tag/livekit-agents%401.4.4). Older versions won't recognize the `universal-3-5-pro` model and return a validation error.
    </Warning>
  </Step>

  <Step title="Set your API keys">
    Set your API keys in a `.env` file:

    ```env theme={null}
    LIVEKIT_URL=wss://your-project.livekit.cloud
    LIVEKIT_API_KEY=your_livekit_api_key
    LIVEKIT_API_SECRET=your_livekit_api_secret
    ASSEMBLYAI_API_KEY=your_assemblyai_key
    # Add API keys for your chosen LLM and TTS providers
    ```

    <Tip>
      You can obtain an AssemblyAI API key by signing up
      [here](https://www.assemblyai.com/dashboard/signup) and navigating to the [API
      Keys tab](https://www.assemblyai.com/dashboard/home) of the dashboard.
    </Tip>
  </Step>

  <Step title="Build a minimal agent">
    The following example uses `turn_detection="stt"` (recommended). Pay close attention to the comments for using with `MultilingualModel()`.

    ```python expandable theme={null}
    from dotenv import load_dotenv
    from livekit import agents
    from livekit.agents import AgentSession, Agent, TurnHandlingOptions
    from livekit.plugins import (
        assemblyai,
        silero,
    )
    # For MultilingualModel, uncomment the following:
    # from livekit.plugins.turn_detector.multilingual import MultilingualModel

    load_dotenv()


    class Assistant(Agent):
        def __init__(self) -> None:
            super().__init__(instructions="You are a helpful voice AI assistant.")


    async def entrypoint(ctx: agents.JobContext):
        await ctx.connect()

        session = AgentSession(
            stt=assemblyai.STT(
                model="universal-3-5-pro",
                min_turn_silence=100,
                max_turn_silence=1000,  # When turn_detection="stt", override plugin default of 100.
                # If using MultilingualModel(), the plugin defaults (min: 100, max: 100) work well. Omit min_turn_silence and max_turn_silence above if preferred.
                vad_threshold=0.3,  # Match Silero's activation_threshold
                # continuous_partials is True by default in the LiveKit plugin — steady ~3s partials during long turns.
                # interruption_delay=0,  # Optional: faster first partial (~300ms effective). Default: 500 (~800ms effective).
            ),
            # llm=your_llm_plugin(),  # Add your LLM provider here
            # tts=your_tts_plugin(),  # Add your TTS provider here
            vad=silero.VAD.load(
                activation_threshold=0.3,  # Match AssemblyAI's internal VAD threshold
            ),
            turn_handling=TurnHandlingOptions(
                turn_detection="stt",
                # To use LiveKit's turn detection instead, replace the line above with:
                # turn_detection=MultilingualModel(),
                endpointing={"min_delay": 0},  # Avoid additive delay in STT mode
                # If using MultilingualModel(), set these instead:
                # endpointing={"min_delay": 0.5, "max_delay": 3.0},
            ),
        )

        await session.start(
            room=ctx.room,
            agent=Assistant(),
        )

        await session.generate_reply(
            instructions="Greet the user and offer your assistance."
        )


    if __name__ == "__main__":
        agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
    ```

    For a complete voice agent, you will also need to install LLM and TTS plugins for your chosen providers. See the [LiveKit plugins documentation](https://docs.livekit.io/agents/plugins/) for available options.
  </Step>

  <Step title="Run and test">
    Start your agent in development mode:

    ```bash theme={null}
    python your_agent_file.py dev
    ```

    Then test it in the LiveKit Playground:

    1. Go to [agents-playground.livekit.io](https://agents-playground.livekit.io)
    2. Connect to your LiveKit Cloud project (same credentials as your `.env`)
    3. Click **Connect**: a room will be created, your agent will join, and you can start talking
  </Step>
</Steps>

## Parameters reference

### **Universal 3.5 Pro Realtime** parameters

These are the key parameters to tune for LiveKit when using **Universal 3.5 Pro Realtime**:

<ParamField path="model" type="string" default="universal-3-5-pro">
  The streaming model. Defaults to `"universal-3-5-pro"`, our recommended
  flagship model; also accepts `"u3-rt-pro"`. Both belong to the U3 Pro family
  and share every parameter below.
</ParamField>

<ParamField path="mode" type="string">
  Accuracy/latency preset: `"min_latency"`, `"balanced"`, or `"max_accuracy"`. Sets
  the defaults for mode-dependent fields (`min_turn_silence`, `vad_threshold`,
  `interruption_delay`, `continuous_partials`, `previous_context_n_turns`); any value
  you set explicitly still takes precedence. Leave unset for the server's default
  preset. Connect-time only — cannot be changed with `update_options`. U3 Pro family
  only. See [Optimizing accuracy and latency](/streaming/getting-started/optimizing-accuracy-and-latency).
</ParamField>

<ParamField path="keyterms_prompt" type="list of strings">
  List of terms to boost recognition for. Applied automatically alongside any
  contextual prompt.
</ParamField>

<ParamField path="prompt" type="string">
  Contextual prompt — a natural-language description of what the audio is
  about (domain, scenario, or full details). Transcription behavior is built
  in and optimized automatically.
</ParamField>

<ParamField path="agent_context" type="string">
  Your agent's most recent spoken reply, up to \~1500 characters, used as context
  for transcribing the next user turn. Can be set at construction time and
  updated mid-stream with `update_options`. U3 Pro family only. See
  [Conversation context](#conversation-context).
</ParamField>

<ParamField path="previous_context_n_turns" type="integer">
  How many prior conversation entries are carried forward automatically. Range
  `0`–`100`; `0` disables carryover. Leave unset for the server default (3).
  Connect-time only — cannot be changed with `update_options`. U3 Pro family
  only.
</ParamField>

<ParamField path="min_turn_silence" type="integer" default="100">
  Milliseconds of silence before a speculative end-of-turn check. When the check
  fires, the model looks for terminal punctuation to decide whether the turn has
  ended.
</ParamField>

<ParamField path="max_turn_silence" type="integer" default="100">
  Maximum milliseconds of silence before the turn is forced to end, regardless
  of punctuation. **The LiveKit plugin defaults to `100`.** Set to `1000` when
  using `turn_detection="stt"`.
</ParamField>

<ParamField path="vad_threshold" type="float" default="0.3">
  AssemblyAI's internal Silero VAD threshold. **Universal 3.5 Pro Realtime**
  defaults to `0.3`, unlike Universal-Streaming's `0.4`. Align with LiveKit's
  Silero `activation_threshold` for consistent behavior.
</ParamField>

<ParamField path="voice_focus" type="string">
  Server-side noise suppression that isolates the primary speaker. `"near-field"`
  for close-talking mics, `"far-field"` for distant capture. Connect-time only.
  U3 Pro family only. See [Voice focus](#voice-focus).
</ParamField>

<ParamField path="voice_focus_threshold" type="float">
  How aggressively `voice_focus` suppresses background audio. `0.0`–`1.0`; higher
  is more aggressive. Connect-time only. U3 Pro family only.
</ParamField>

<ParamField path="continuous_partials" type="boolean" default="true">
  Whether to emit additional partial transcripts during long turns at a steady
  \~3 second cadence. When enabled (default on both the API and the LiveKit
  plugin), additional partials covering the full turn transcript are emitted
  approximately every 3 seconds while speech continues. When disabled, only one
  early partial is emitted near turn start. The first partial (at 750ms) is
  unaffected. Useful when downstream consumers (LLMs, UI, eager inference) need
  frequent updates during long, uninterrupted turns. See
  [Continuous partials](/streaming/getting-started/transcribe-streaming-audio)
  for details.
</ParamField>

<ParamField path="interruption_delay" type="integer" default="500">
  How soon the first partial transcript is emitted during a turn, in
  milliseconds. Range: `0`–`1000`. Lower values produce faster time to first
  token (TTFT) for barge-in and speculative inference; higher values produce
  more confident first partials. The server adds a minimum of 300ms on top of
  the configured value (`interruption_delay: 0` → \~300ms effective,
  `interruption_delay: 500` → \~800ms effective). See
  [Tuning early partial timing](/streaming/getting-started/transcribe-streaming-audio)
  for details.
</ParamField>

<ParamField path="language_detection" type="boolean" default="true">
  **Universal 3.5 Pro Realtime** code-switches natively between supported
  languages. This parameter controls whether `language_code` and
  `language_confidence` are included in turn messages. Defaults to `true` in the
  LiveKit plugin, but `false` when using the API directly.
</ParamField>

### General STT parameters

These parameters apply to all AssemblyAI streaming models and can remain the same between models:

<ParamField path="sample_rate" type="int" default="16000">
  The sample rate of the audio stream.
</ParamField>

<ParamField path="encoding" type="str" default="pcm_s16le">
  The encoding of the audio stream. Allowed values: `pcm_s16le`, `pcm_mulaw`.
</ParamField>

### Legacy parameters

These parameters apply to the `universal-streaming-english` and `universal-streaming-multilingual` AssemblyAI streaming models, but **do not affect Universal 3.5 Pro Realtime**:

<ParamField path="end_of_turn_confidence_threshold" type="float" default="0.4">
  Confidence threshold for end-of-turn detection. **Universal 3.5 Pro Realtime**
  uses punctuation-based turn detection instead.
</ParamField>

<ParamField path="format_turns" type="boolean" default="false">
  Whether to return formatted final transcripts. **Universal 3.5 Pro Realtime**
  always returns formatted transcripts, so this parameter no longer applies.
</ParamField>

## Turn detection

In LiveKit, how your agent detects the end of a user's turn is controlled by the [`turn_detection`](https://docs.livekit.io/reference/agents/turn-handling-options/#turn_detection) parameter inside [`TurnHandlingOptions`](https://docs.livekit.io/reference/agents/turn-handling-options/), which is passed to `AgentSession` via the `turn_handling` argument.

**Universal 3.5 Pro Realtime** uses a **punctuation-based turn detection system**, which checks for terminal punctuation (`.` `?` `!`) after periods of silence rather than using a confidence score.

This means the `min_turn_silence` and `max_turn_silence` parameters you pass to AssemblyAI directly control when transcripts are emitted and when turns end. For more details on how this works, see [Configuring turn detection](/streaming/getting-started/transcribe-streaming-audio).

<Warning>
  When not explicitly provided, the default endpointing parameters for **Universal 3.5 Pro Realtime** differ on **LiveKit** versus using **AssemblyAI's API directly**:

  * LiveKit AssemblyAI plugin defaults:
    * `min_turn_silence=100`
    * `max_turn_silence=100`
  * AssemblyAI API defaults:
    * `min_turn_silence=100`
    * `max_turn_silence=1000`

  However, you can always override these by passing your own preferred values explicitly. Misconfiguring these parameters is the most common cause of poor performance — read the recommended values per mode below.
</Warning>

### Default parameter differences

**Universal 3.5 Pro Realtime's** endpointing is controlled by two AssemblyAI API parameters, `min_turn_silence` and `max_turn_silence`, that you pass to the STT plugin. These are separate from LiveKit's `endpointing.min_delay` and `endpointing.max_delay` (set inside `TurnHandlingOptions`).

| Parameter             | AssemblyAI API default | LiveKit plugin default | Description                                                                                                                                                                  |
| --------------------- | ---------------------- | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `min_turn_silence`    | `100` ms               | `100` ms               | Silence before a speculative end-of-turn check. If terminal punctuation (`.` `?` `!`) is found, the turn ends. If not, a partial is emitted and the turn continues.          |
| `max_turn_silence`    | **`1000`** ms          | **`100`** ms           | Maximum silence before forcing the turn to end, regardless of punctuation.                                                                                                   |
| `continuous_partials` | **`true`**             | **`true`**             | Emit partial transcripts every \~3 seconds during long turns for steady mid-turn updates to downstream consumers. Enabled by default on both the API and the LiveKit plugin. |

The LiveKit plugin defaults are optimized for third-party turn detection models, where you want transcripts handed off as fast as possible. When using `turn_detection="stt"`, you should explicitly set `max_turn_silence=1000` if you'd like to mimic the behavior of streaming directly to the API without LiveKit.

<Note>
  **Tuning endpointing parameters**

  **These are the default values used when no parameters are explicitly provided.** You will likely need to experiment with different values depending on your use case:

  * **Increase `min_turn_silence`**: when brief pauses cause the speculative EOT check to fire too early, ending turns on terminal punctuation before the user has finished speaking.
  * **Increase `max_turn_silence`**: when the forced turn end is cutting off users mid-thought or splitting entities like phone numbers across turns, a higher value lets the model wait longer before forcing the turn to end when the model is unsure.

  See the [Entity splitting tradeoff](#entity-splitting-tradeoff) section for examples.
</Note>

### STT-based Turn Detection (recommended)

With `turn_detection="stt"`, AssemblyAI's built-in punctuation-based turn detection determines when the user has finished speaking. AssemblyAI's `end_of_turn` signals are then used directly by LiveKit to commit the turn.

In this mode, we recommend explicitly setting `min_turn_silence=100` and `max_turn_silence=1000`. These are AssemblyAI's API defaults and provide a good balance of responsiveness and accuracy.

**The LiveKit plugin defaults to `min_turn_silence=100` and `max_turn_silence=100`, which might be too aggressive for STT-based turn detection.**

**Recommended starting parameters** (set on `assemblyai.STT()`, not on `AgentSession`):

| Parameter          | Default   | Description                                                          |
| ------------------ | --------- | -------------------------------------------------------------------- |
| `min_turn_silence` | `100` ms  | Silence duration before a speculative end-of-turn (EOT) check fires. |
| `max_turn_silence` | `1000` ms | Maximum silence before a turn is forced to end.                      |

**How it works:**

1. User speaks → audio streams to AssemblyAI
2. User pauses for `100ms` → AssemblyAI checks for terminal punctuation
3. If terminal punctuation (`.` `?` `!`) → turn ends immediately
4. If no terminal punctuation → partial emitted, turn continues waiting
5. If silence reaches `1000ms` → turn is forced to end regardless of punctuation

<Warning>
  **Endpointing `min_delay` is additive in STT mode**

  LiveKit's endpointing `min_delay` (default **0.5 seconds**) is applied **on top of** AssemblyAI's own endpointing. In STT mode, this delay starts *after* the STT end-of-speech signal, meaning it adds up to `500ms` of extra latency by default.

  **Set `endpointing={"min_delay": 0}`** inside `TurnHandlingOptions` to avoid this. AssemblyAI's own endpointing parameters (`min_turn_silence` and `max_turn_silence`) already control the timing, so an additional delay on the LiveKit side is unnecessary latency. See [Latency](#latency) for the full picture.
</Warning>

```python theme={null}
from livekit.agents import AgentSession, TurnHandlingOptions

session = AgentSession(
    turn_handling=TurnHandlingOptions(
        turn_detection="stt",
        endpointing={"min_delay": 0},  # Avoid additive delay in STT mode
    ),
    stt=assemblyai.STT(
        model="universal-3-5-pro",
        min_turn_silence=100,   # Silence (ms) before a speculative end-of-turn check
        max_turn_silence=1000,  # Max silence (ms) before forcing the turn to end
        vad_threshold=0.3,
    ),
    vad=silero.VAD.load(
        activation_threshold=0.3,
    ),
)
```

### LiveKit turn detection (with `MultilingualModel()`)

As a third-party turn detection model, LiveKit's [turn detector](https://docs.livekit.io/agents/logic/turns/turn-detector/) runs on top of STT output to make turn decisions. AssemblyAI's role is then just to provide transcripts as quickly as possible, while the turn detection model decides when the user is actually done speaking.

Use `MultilingualModel()` rather than `EnglishModel()`, as **Universal 3.5 Pro Realtime** supports 18 languages. `MultilingualModel()` covers support for all of these languages.

The LiveKit plugin defaults of `min_turn_silence=100` and `max_turn_silence=100` work well here, as **`max_turn_silence` is brought down to match `min_turn_silence` so that transcripts are handed off to the turn detection model as fast as possible.**

**`MultilingualModel` parameters** (set inside `TurnHandlingOptions` → `endpointing`, not on the STT plugin):

| Parameter               | Default | Description                                                                                                                |
| ----------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------- |
| `endpointing.min_delay` | `0.5` s | Time to wait before committing a turn when the model predicts a likely boundary.                                           |
| `endpointing.max_delay` | `3.0` s | Maximum time to wait when the model predicts the user will continue speaking. Has no effect without a turn detector model. |

**How it works:**

1. User speaks → audio streams to AssemblyAI
2. User pauses for `100ms` → AssemblyAI emits transcript (final and partial are the same) immediately
3. LiveKit's `MultilingualModel()` evaluates the transcript in conversational context
4. If the model predicts a likely turn boundary → waits `min_delay` (`0.5s`) then commits the turn
5. If the model predicts the user will continue → waits up to `max_delay` (`3.0s`) for more speech

```python theme={null}
from livekit.agents import AgentSession, TurnHandlingOptions
from livekit.plugins.turn_detector.multilingual import MultilingualModel

session = AgentSession(
    turn_handling=TurnHandlingOptions(
        turn_detection=MultilingualModel(),
        endpointing={
            "min_delay": 0.5,  # Time (s) to wait before committing a turn when the model is confident
            "max_delay": 3.0,  # Max time (s) to wait when the model is not confident
        },
    ),
    stt=assemblyai.STT(
        model="universal-3-5-pro",
        vad_threshold=0.3,
    ),
    vad=silero.VAD.load(
        activation_threshold=0.3,
    ),
)
```

### Other turn detection modes

* **`vad`**:

  * Detect end of turn from speech and silence data alone using [Silero VAD](https://docs.livekit.io/agents/logic/turns/vad/).
  * Turn boundaries are determined purely by voice activity without semantic context.
  * AssemblyAI's turn detection parameters still control when transcripts are emitted, but it is recommended to leave them at the plugin defaults (`min_turn_silence=100`, `max_turn_silence=100`) so transcripts arrive as quickly as possible.

* **`manual`**:
  * Disable automatic turn detection entirely.
  * You control turns explicitly using `session.commit_user_turn()`, `session.clear_user_turn()`, and `session.interrupt()`.
  * See the [manual turn control docs](https://docs.livekit.io/agents/logic/turns/#manual-turn-control) for details.

### VAD configuration

With `turn_detection="stt"`, AssemblyAI also sends `SpeechStarted` events that LiveKit uses for barge-in/interruption handling. `SpeechStarted` is only emitted when the model produces a transcript.

Silero VAD is not strictly required in this mode, but it is still recommended as Silero runs locally and it can be faster than waiting for AssemblyAI's `SpeechStarted` signal. LiveKit respects whichever signal arrives first, so Silero provides faster interruption while AssemblyAI's signal serves as a reliable backup.

With `MultilingualModel()`, Silero VAD is **required**, as it is the only source of `START_OF_SPEECH` events for interruption in this mode. AssemblyAI's `SpeechStarted` event is not used.

#### Threshold alignment

LiveKit's Silero VAD defaults to an `activation_threshold` of **0.5**. AssemblyAI's `vad_threshold` defaults to **0.3**. For best performance, we recommend setting both to **0.3**.

Both should be adjusted together to the same value to ensure accurate transcription and consistent barge-in thresholds.

When the thresholds are mismatched, you get a dead zone: if Silero is at `0.5` and AssemblyAI is at `0.3`, AssemblyAI will be actively transcribing speech that LiveKit hasn't detected yet, delaying interruption. Keeping them aligned eliminates this.

```python theme={null}
session = AgentSession(
    stt=assemblyai.STT(
        model="universal-3-5-pro",
        vad_threshold=0.3,         # AssemblyAI's internal VAD onset
    ),
    vad=silero.VAD.load(
        activation_threshold=0.3,  # Match AssemblyAI's threshold
    ),
)
```

If you're in a noisy environment and receiving false speech triggers, raise both `stt.vad_threshold` and `vad.activation_threshold` thresholds together.

### Entity splitting tradeoff

Lower `min_turn_silence` and `max_turn_silence` values produce faster transcripts but can split entities or utterances across turns. The two parameters affect this differently.

#### `min_turn_silence` too low

* Speculative check fires too early, splitting entities on punctuation.

* **Example:** *User spells out an email address with brief pauses between parts. The speculative check fires at 100ms of silence, and the model adds terminal punctuation to each segment, ending the turn prematurely.*

```text theme={null}
# With (min_turn_silence=100, max_turn_silence=1000)
"It's John."                    → FINAL (100ms pause, check fires, period found → turn ends)
"Smith."                        → FINAL
"At gmail.com."                 → FINAL

# With (min_turn_silence=400, max_turn_silence=1000)
"It's john.smith@gmail.com."    → FINAL (single turn, properly formatted)
```

#### `max_turn_silence` too low

* Forced turn-end cuts off user mid-thought.

* **Example:** *User pauses longer than 1 second to think mid-sentence. The forced end fires at 1000ms, splitting the utterance into two turns regardless of punctuation.*

```text theme={null}
# With (min_turn_silence=100, max_turn_silence=1000)
"I wanted to check on my order from..."  → FINAL (1000ms silence, forced end)
"last Tuesday, order number 4829."     → FINAL (new turn)

# With (min_turn_silence=100, max_turn_silence=2000)
"I wanted to check on my order from last Tuesday, order number 4829."  → FINAL (single turn)
```

**Universal 3.5 Pro Realtime's** formatting is significantly better when it has full context in a single turn. Email addresses, phone numbers, credit card numbers, and physical addresses all benefit from this.

<Note>
  LLMs downstream can usually piece together split entities, but if your use
  case involves alphanumeric dictation or entity extraction, consider increasing
  `min_turn_silence` and `max_turn_silence` during those portions of the
  conversation. You can [update configuration
  mid-stream](#dynamic-configuration) to raise `max_turn_silence`
  temporarily (e.g., to `2000`–`4000` ms) when expecting entity input, then
  lower it again afterward.
</Note>

Even when using third-party turn detection, you may want to increase `min_turn_silence` or `max_turn_silence` if users are likely to speak slowly or dictate entities. While this adds latency, it improves accuracy by giving the model more audio context before emitting a transcript and keeping the full entity complete within the same turn.

## Latency

A voice agent feels responsive when the gap between the user finishing and the agent replying is short.

Start with the **`mode` preset** — the highest-level dial for the accuracy/latency trade-off. It sets sensible defaults for the fine-grained levers below, so you can pick a target and tune from there:

```python theme={null}
stt=assemblyai.STT(
    model="universal-3-5-pro",
    mode="balanced",  # "min_latency" (fastest) · "balanced" · "max_accuracy" (best quality)
)
```

`mode` is set at construction time (it can't be changed mid-session) and influences the defaults of `min_turn_silence`, `vad_threshold`, `interruption_delay`, `continuous_partials`, and `previous_context_n_turns`. Any value you set explicitly still wins. Leave it unset to use the server's default preset. See [Optimizing accuracy and latency](/streaming/getting-started/optimizing-accuracy-and-latency).

From there, fine-tune the individual levers:

* **Endpointing delay (STT mode).** LiveKit's `endpointing.min_delay` is applied *on top of* AssemblyAI's own end-of-turn timing and adds up to `500ms` by default. Set `endpointing={"min_delay": 0}` when using `turn_detection="stt"` — see [STT-based turn detection](#stt-based-turn-detection-recommended).
* **End-of-turn timing.** `min_turn_silence` (speculative check) and `max_turn_silence` (forced end) directly control how soon a turn ends. Lower is faster but risks splitting entities — see [Turn detection](#turn-detection).
* **Time to first partial.** `interruption_delay` controls how soon the first partial is emitted, which drives faster barge-in and speculative inference. The server adds a minimum of `300ms` on top of the configured value.

```python theme={null}
# Tune first-partial timing for faster barge-in
stt.update_options(
    interruption_delay=0,  # ~300ms effective TTFT
)
```

* **Sample rate.** Use 16 kHz (`sample_rate=16000`). Higher rates don't improve accuracy and only add bandwidth.
* **Continuous partials.** `continuous_partials` (on by default) emits a partial every \~3 seconds during long turns. Leave it on for steady mid-turn updates, or disable it if you only need a single early partial.
* **Skip client-side preprocessing.** Don't run your own noise cancellation before audio reaches the model — the artifacts it introduces usually hurt accuracy more than the original noise. Use server-side [Voice Focus](#voice-focus) instead.

### Latency breakdown

| Stage                                    | Typical                                               | Controlled by           |
| ---------------------------------------- | ----------------------------------------------------- | ----------------------- |
| Network round trip                       | \~50 ms                                               | —                       |
| Speech-to-text                           | \~200–300 ms                                          | model                   |
| First partial (TTFT)                     | configured `interruption_delay` + \~300 ms server min | `interruption_delay`    |
| End of turn (terminal punctuation found) | `min_turn_silence` (default 100 ms)                   | `min_turn_silence`      |
| End of turn (no punctuation, forced)     | up to `max_turn_silence`                              | `max_turn_silence`      |
| LiveKit endpointing (STT mode)           | + `min_delay` (set to `0`)                            | `endpointing.min_delay` |

## Accuracy

**Universal 3.5 Pro Realtime** is accurate out of the box. When you need more — domain vocabulary, proper nouns, noisy audio — reach for these levers. For entity-heavy dictation, also tune turn detection (see [Entity splitting tradeoff](#entity-splitting-tradeoff)), and note that the high-level [`mode` preset](#latency) shifts the overall accuracy/latency balance (use `max_accuracy` to favor quality).

### Prompting

<Warning>
  **Beta feature**

  Prompting is considered a beta feature for **Universal 3.5 Pro Realtime**.

  While it can be a powerful tool for improving accuracy in certain use cases, **we recommend starting without a `prompt` to first establish baseline performance.**

  Once the baseline has been tested, you can add context to further optimize for your use case (e.g., language mix to expect (e.g., English and Hindi), use case or domain (e.g., medical, legal), etc.).
</Warning>

**Universal 3.5 Pro Realtime** supports a `prompt` parameter for [contextual prompting](/streaming/prompting-and-keyterms) — a description of what the audio is about. Transcription behavior (verbatim output, punctuation, turn detection) is built in and optimized automatically; the prompt carries context, not instructions.

```python theme={null}
stt=assemblyai.STT(
    model="universal-3-5-pro",
    prompt="Customer support call about an internet service outage.",
)
```

**Tips:**

* **Start with no prompt**: Universal 3.5 Pro Realtime delivers strong accuracy out of the box — only add context if domain-specific vocabulary is being misrecognized.
* **Describe the conversation**: domain, scenario, or full details — start broad, and add only details your application actually knows (see the [three context levels](/streaming/prompting-and-keyterms#contextual-prompting)).
* **Include known names and identifiers**: caller name, account or order IDs, products — detailed context helps the model spell them correctly.

### Key terms

Instead of `prompt`, use `keyterms_prompt` to boost recognition of specific names, brands, or domain terms:

```python theme={null}
stt=assemblyai.STT(
    model="universal-3-5-pro",
    keyterms_prompt=["AssemblyAI", "LiveKit", "Universal 3.5 Pro Realtime"],
)
```

### Conversation context

Give the model both sides of the dialog so it transcribes the next user turn more accurately. **Universal 3.5 Pro Realtime** keeps a short, per-session memory of the conversation from two sources:

* **The agent half** — what your agent just said, which you push in via the `agent_context` parameter.
* **The user half** — prior STT-finalized user turns, carried forward automatically (no configuration needed).

With the agent's question in context, the model can anticipate the answer, sharpen entity recognition, and disambiguate similar-sounding words. For example, after your agent asks `"What's your email address?"`, `agent_context` lets the model produce `"user@assemblyai.com"` instead of `"user at assemblyai dot com"`. This has the biggest impact on short replies (`"yes"`, `"7pm"`, single names) and spelled-out entities. See [Conversation context](/streaming/universal-3-pro/context-carryover) for the full reference.

| Parameter                  | Type    | Description                                                                                                                                                                                                     |
| -------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `agent_context`            | string  | Your agent's most recent spoken reply (what your TTS just said), up to \~1500 characters. Set it at construction time to seed an opening greeting, and update it after each agent reply via `update_options()`. |
| `previous_context_n_turns` | integer | How many prior conversation entries are carried forward automatically. Connect-time only. Range `0`–`100`; `0` disables carryover; leave unset for the server default (3).                                      |

<Note>
  `agent_context` and `previous_context_n_turns` are supported only on the U3 Pro family (`universal-3-5-pro`, `u3-rt-pro`). `previous_context_n_turns` is set at construction time and cannot be changed with `update_options()`.
</Note>

#### Wiring agent context with `conversation_item_added`

Push your agent's reply into the session whenever the agent finishes a turn. LiveKit emits a `conversation_item_added` event for every conversation item; filter for the assistant's messages and forward the (trimmed) text to the STT plugin via `update_options(agent_context=...)`. No reconnect is required.

A complete runnable example is on GitHub: [including-agent-context](https://github.com/dlange-aai/assemblyai-livekit-examples/tree/main/including-agent-context).

```python theme={null}
from livekit.agents import AgentSession, ConversationItemAddedEvent
from livekit.plugins import assemblyai

AGENT_CONTEXT_MAX_CHARS = 1500

session = AgentSession(
    stt=assemblyai.STT(
        model="universal-3-5-pro",
        # Carry the last 3 turns forward automatically (server default). Set 0 to disable.
        previous_context_n_turns=3,
    ),
    # llm=your_llm_plugin(),
    # tts=your_tts_plugin(),
)


@session.on("conversation_item_added")
def _on_conversation_item_added(ev: ConversationItemAddedEvent) -> None:
    # Only forward the agent's own replies as context for the next user turn.
    if ev.item.type != "message" or ev.item.role != "assistant":
        return

    agent_stt = session.stt
    if not isinstance(agent_stt, assemblyai.STT):
        return

    spoken = ev.item.text_content
    if not spoken:
        return

    # Push the latest reply; earlier turns are carried forward automatically.
    agent_stt.update_options(agent_context=spoken[-AGENT_CONTEXT_MAX_CHARS:])
```

<Tip>
  To seed the model with your agent's opening greeting, pass `agent_context` directly on `assemblyai.STT(...)` at construction time, then keep it current with `update_options()` after each reply.
</Tip>

### Voice focus

Voice Focus isolates the primary speaker and suppresses background noise — chatter, keyboard clicks, fan hum, room echo — **server-side, before audio reaches the model**. Use it instead of client-side noise cancellation, which tends to introduce artifacts that hurt accuracy more than the noise itself.

| Parameter               | Type   | Description                                                                                                                                      |
| ----------------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `voice_focus`           | string | `"near-field"` for headsets, handsets, and other close-talking mics; `"far-field"` for conference rooms, laptop mics, and other distant capture. |
| `voice_focus_threshold` | float  | Optional. `0.0`–`1.0`; higher values suppress background audio more aggressively.                                                                |

Both are connection-time parameters on the U3 Pro family and cannot be changed with `update_options()`. See [Voice Focus](/streaming/voice-focus) for details.

```python theme={null}
stt=assemblyai.STT(
    model="universal-3-5-pro",
    voice_focus="far-field",      # "near-field" for close-talking mics
    voice_focus_threshold=0.5,    # Optional: 0.0–1.0, higher = more aggressive
)
```

## Interruption handling

In voice agent conversations, users often produce backchannel utterances ("mhm", "yeah", "um", "okay") while the agent is speaking. These short fillers can trigger LiveKit's interruption logic, causing the agent to stop mid-sentence even though the user didn't intend to interrupt.

LiveKit Cloud users should reach for adaptive interruption handling first. Self-hosted deployments can combine the two custom filters described below.

### Recommended: Adaptive interruption handling

LiveKit Agents v1.5.0 introduced [adaptive interruption handling](https://docs.livekit.io/agents/logic/turns/adaptive-interruption-handling/), a server-side model that classifies overlapping speech as a real interrupt or a backchannel. On a false interrupt the agent's TTS resumes from where it left off. No re-generation is needed. The feature is recommended for LiveKit Cloud users and is included on all Cloud plans.

```python theme={null}
session = AgentSession(
    stt=assemblyai.STT(model="universal-3-5-pro"),
    # llm=your_llm_plugin(),
    # tts=your_tts_plugin(),
    vad=silero.VAD.load(),
    interruption={"mode": "adaptive"},  # default in v1.5.0+
)
```

See LiveKit's [adaptive interruption handling docs](https://docs.livekit.io/agents/logic/turns/adaptive-interruption-handling/) for full requirements and behavior.

### Self-hosted alternative

If you're self-hosting LiveKit, can't use adaptive interruption handling, or need explicit control over which utterances count as filler, the two complementary filters below provide strong guardrails for interruption and barge-in handling. They have non-overlapping failure modes: the backchannel filter stops known fillers before they enter the pipeline, while the buffer-clearing filter catches any short utterance (including unknown fillers or stutters) that slips past. Running both provides the strongest coverage. A working reference implementation is available on [GitHub](https://github.com/AssemblyAI-Solutions/livekit-interruption-filters).

<AccordionGroup>
  <Accordion title="Upstream backchannel filter" icon="filter">
    The backchannel filter intercepts STT events at the `stt_node` level, before they reach LiveKit's `audio_recognition`. It checks each transcript event for known disfluencies and backchannels ("um", "mhm", "yeah", "okay", etc.) and drops the event entirely. Because the event never enters the pipeline, none of the downstream orchestration (interrupt gates, end-of-turn detection, and preemptive LLM generation) ever reacts to it.

    The filter is implemented as a mixin class that wraps `Agent.stt_node`:

    ```python theme={null}
    from __future__ import annotations

    import logging
    import string
    import time
    from collections.abc import AsyncIterable

    from livekit import rtc
    from livekit.agents import Agent, stt
    from livekit.agents.voice import ModelSettings


    # "yes" / "no" deliberately omitted - in a booking flow a bare "yes"
    # is a real confirmation. Edit for your domain.
    BACKCHANNELS = frozenset({
        "mhm", "mm", "mmhm", "mmhmm",
        "uh", "uhhuh", "huh",
        "um", "umm", "uhm",
        "er", "erm",
        "hmm", "hm",
        "ah", "oh",
        "yeah", "yep", "yup",
        "okay", "ok",
        "right", "alright", "gotcha",
    })

    _TRANSCRIPT_TYPES = {
        stt.SpeechEventType.INTERIM_TRANSCRIPT,
        stt.SpeechEventType.PREFLIGHT_TRANSCRIPT,
        stt.SpeechEventType.FINAL_TRANSCRIPT,
    }

    _PUNCT_STRIP = str.maketrans("", "", string.punctuation)

    log = logging.getLogger("backchannel_stt_filter")


    def _is_all_backchannel(text: str) -> bool:
        """Return True only when every token is a known backchannel."""
        tokens = text.lower().translate(_PUNCT_STRIP).split()
        return bool(tokens) and all(tok in BACKCHANNELS for tok in tokens)


    class BackchannelSTTFilterMixin:
        """Drop backchannel-only transcripts while the agent is speaking."""

        _FILTER_GRACE_S: float = 1.0
        _last_speaking_at: float = 0.0

        async def stt_node(
            self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
        ):
            async for ev in Agent.default.stt_node(self, audio, model_settings):
                if self._should_drop(ev):
                    text = ev.alternatives[0].text if ev.alternatives else ""
                    log.info(
                        "event_filtered transcript=%r ev_type=%s agent_state=%s",
                        text, ev.type, self.session.agent_state,
                    )
                    continue
                yield ev

        def _should_drop(self, ev: stt.SpeechEvent) -> bool:
            now = time.monotonic()
            if self.session.agent_state == "speaking":
                self._last_speaking_at = now
            elif now - self._last_speaking_at > self._FILTER_GRACE_S:
                return False

            if ev.type not in _TRANSCRIPT_TYPES:
                return False

            text = ev.alternatives[0].text if ev.alternatives else ""
            return _is_all_backchannel(text)
    ```

    **How it works:**

    1. While the agent is speaking (plus a 1-second grace window after speech ends), the filter inspects each STT transcript event
    2. It strips punctuation and checks whether every token in the transcript matches the `BACKCHANNELS` set
    3. Pure-filler transcripts like "mhm" or "yeah okay" are dropped. They never reach LiveKit's pipeline
    4. Utterances with any non-filler token (e.g., "yeah I want the suite") always pass through

    <Note>
      The `BACKCHANNELS` set is domain-customizable. "yes" and "no" are deliberately excluded because they often represent genuine confirmations in booking or IVR scenarios. Edit the set for your use case.
    </Note>
  </Accordion>

  <Accordion title="Short-utterance buffer clearing" icon="eraser">
    LiveKit accumulates committed `FINAL` transcripts in a private buffer (`_audio_transcript`) across the user's uncommitted turn. The `_interrupt_by_audio_activity` method checks the running word count against `min_words` to decide whether to pause TTS.

    Without intervention, two consecutive short fillers like "yeah" + "um" sum to two words and trip the interrupt gate, even though each utterance on its own is below threshold.

    This filter listens on the `user_input_transcribed` event and wipes the buffer whenever the agent is speaking and the user input falls below the configured `min_words` threshold. Each short utterance is evaluated independently rather than against the accumulated total.

    <Warning>
      This filter requires `interruption.min_words` to be set to `2` or higher. Without it, the word-count gate is disabled and the filter has no effect.
    </Warning>

    ```python theme={null}
    from __future__ import annotations

    import logging

    from livekit.agents import AgentSession


    log = logging.getLogger("short_utterance_buffer_filter")


    def install_short_utterance_filter(session: AgentSession) -> None:
        """Clear transcript buffers when a short utterance arrives during agent speech."""

        @session.on("user_input_transcribed")
        def _on_user_input_transcribed(ev) -> None:
            word_count = len(ev.transcript.split())
            min_words = session.options.interruption["min_words"]

            if session.agent_state != "speaking":
                return

            if word_count >= min_words:
                return

            activity = getattr(session, "_activity", None)
            recognition = getattr(activity, "_audio_recognition", None) if activity else None
            if recognition is None:
                return

            # Wipe all three transcript buffers so short utterances
            # don't accumulate past the interrupt threshold.
            recognition._audio_transcript = ""
            recognition._audio_interim_transcript = ""
            recognition._audio_preflight_transcript = ""

            # Best-effort: abort any in-flight preemptive LLM call
            # triggered by this short utterance.
            cancel = getattr(activity, "_cancel_preemptive_generation", None)
            if callable(cancel):
                try:
                    cancel()
                except Exception:
                    log.debug("_cancel_preemptive_generation failed", exc_info=True)

            log.info(
                "buffer_cleared transcript=%r words=%d is_final=%s",
                ev.transcript, word_count, ev.is_final,
            )
    ```

    <Warning>
      This filter accesses LiveKit private APIs (`_audio_transcript`, `_audio_interim_transcript`, `_audio_preflight_transcript`, `_cancel_preemptive_generation`). Pin your `livekit-agents` version to avoid breakage on minor updates. Tested with `livekit-agents>=1.5`.
    </Warning>
  </Accordion>

  <Accordion title="Wiring both filters into your agent" icon="plug">
    Three changes are needed:

    **1. Mix in the backchannel filter on your agent class.** Place it before `Agent` in the class bases so its `stt_node` runs first:

    ```python theme={null}
    from filters.backchannel_stt import BackchannelSTTFilterMixin

    class MyAgent(BackchannelSTTFilterMixin, Agent):
        def __init__(self) -> None:
            super().__init__(instructions="You are a helpful voice AI assistant.")
    ```

    **2. Configure the session with `interruption.min_words >= 2`.** This enables the word-count gate that the buffer-clearing filter depends on:

    ```python theme={null}
    session = AgentSession(
        stt=assemblyai.STT(
            model="universal-3-5-pro",
            min_turn_silence=100,
            max_turn_silence=1000,
            vad_threshold=0.3,
        ),
        # llm=your_llm_plugin(),
        # tts=your_tts_plugin(),
        vad=None,  # Recommended: disable VAD so only STT drives interruption
        turn_handling={
            "turn_detection": "stt",
            "endpointing": {"min_delay": 1.0, "max_delay": 4.0},
            "interruption": {
                "enabled": True,
                "resume_false_interruption": True,
                "false_interruption_timeout": 1.5,
                "min_words": 2,  # Required for the buffer-clearing filter
            },
        },
    )
    ```

    **3. Install the buffer-clearing filter on the session**, then start it with your agent:

    ```python theme={null}
    from filters.short_utterance_buffer import install_short_utterance_filter

    install_short_utterance_filter(session)

    await session.start(room=ctx.room, agent=MyAgent())
    ```

    <Tip>
      Setting `vad=None` with `turn_detection="stt"` is the recommended setup for both filters. This ensures only STT-based signals drive interruption, avoiding timing races from a competing VAD interrupt path. If you need VAD for faster barge-in, both filters still work: set `interruption.min_words` to `2` and ensure Silero's `activation_threshold` matches `vad_threshold`.
    </Tip>

    **Expected behavior** with both filters active and `min_words=2`:

    | User utterance (during agent speech) | Backchannel filter | Buffer-clearing filter | Result           |
    | ------------------------------------ | ------------------ | ---------------------- | ---------------- |
    | "mhm"                                | Dropped            | -                      | Agent continues  |
    | "um"                                 | Dropped            | -                      | Agent continues  |
    | "yeah yeah"                          | Dropped            | -                      | Agent continues  |
    | Unknown short word                   | Passes through     | Buffer cleared         | Agent continues  |
    | "yeah I'd like the suite"            | Passes through     | Passes through         | Agent interrupts |
    | "suite please"                       | Passes through     | Passes through         | Agent interrupts |
    | "mhm" (1+ second after agent stops)  | Passes through     | -                      | Agent responds   |
  </Accordion>
</AccordionGroup>

## Dynamic configuration

You can update `prompt`, `keyterms_prompt`, `agent_context`, `min_turn_silence`, `max_turn_silence`, `continuous_partials`, and `interruption_delay` during an active session using `update_options` — no reconnect required. This lets you adapt to the conversation stage: boost names while collecting caller details, widen silence windows while a user dictates an email, then tighten them again.

```python theme={null}
# Update one or more options mid-stream
stt.update_options(
    max_turn_silence=3000,  # Increase for entity dictation
)

# Later, reset to default
stt.update_options(
    max_turn_silence=1000,
)
```

| Conversation stage                         | Adjustment                                                                    |
| ------------------------------------------ | ----------------------------------------------------------------------------- |
| Caller identification (names, account IDs) | Boost terms with `update_options(keyterms_prompt=[...])`                      |
| Entity dictation (email, phone, address)   | Raise `max_turn_silence` to \~`2000`–`4000` ms, then lower it again afterward |
| After each agent reply                     | Push `agent_context` (see [Conversation context](#conversation-context))      |
| Faster barge-in                            | Lower `interruption_delay` (see [Latency](#latency))                          |

For more information, see [Updating configuration mid-stream](/streaming/updating-configuration-mid-stream).

## Troubleshooting

| Issue                                     | Cause                                                     | Solution                                                                                             |
| ----------------------------------------- | --------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| Extra latency with `turn_detection="stt"` | LiveKit's endpointing `min_delay` is additive in STT mode | Set `endpointing={"min_delay": 0}` inside `TurnHandlingOptions` on `AgentSession`                    |
| No interruption handling                  | Missing VAD                                               | Ensure `vad=silero.VAD` is set, with `activation_threshold` equal to `vad_threshold` (default `0.3`) |
| Turn over-segmentation                    | `min_turn_silence` too low                                | Increase from `100` to `200`–`500`                                                                   |
| Entities split across turns               | `max_turn_silence` too low                                | Increase `max_turn_silence` (e.g., `1500`–`3500`)                                                    |
| Latency on non-terminal utterances        | `max_turn_silence` too high                               | Lower `max_turn_silence`                                                                             |
| Mis-heard names, brands, or jargon        | No vocabulary hints                                       | Add `keyterms_prompt`, or supply `prompt`/`agent_context` for context                                |
| Poor accuracy in noisy audio              | Background noise or room echo                             | Enable `voice_focus` (`near-field` or `far-field`)                                                   |

## Migrating from another STT provider

To balance accuracy, latency, turn-taking, and interruption handling, map your current setup to AssemblyAI using the questions below. Each answer points to the settings that reproduce — and usually improve on — your current behavior.

### How are you detecting end-of-turn today?

| Today                                                                 | Recommended on AssemblyAI                                                                                                                                                                                |
| --------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Your STT provider's own end-of-turn model (e.g. Deepgram endpointing) | `turn_detection="stt"` with `min_turn_silence=100`, `max_turn_silence=1000`, `endpointing={"min_delay": 0}`. AssemblyAI's punctuation-based end-of-turn replaces it.                                     |
| Silence / VAD only                                                    | Prefer `turn_detection="stt"` for semantic endpointing, or `turn_detection="vad"` to stay silence-only. Align Silero `activation_threshold` and `vad_threshold` at `0.3`.                                |
| LiveKit's turn-detector model                                         | `turn_detection=MultilingualModel()`, keep the plugin defaults (`min_turn_silence=100`, `max_turn_silence=100`), set `endpointing={"min_delay": 0.5, "max_delay": 3.0}`, and keep Silero VAD (required). |

### Which model and settings are you migrating from?

| What you pass today                        | AssemblyAI equivalent                                                                                                    |
| ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ |
| Current model (Deepgram, ElevenLabs, etc.) | `model="universal-3-5-pro"` (recommended flagship) or `"u3-rt-pro"`                                                      |
| Overall accuracy/latency tuning            | `mode="min_latency"` / `"balanced"` / `"max_accuracy"` — a one-line starting point before fine-tuning                    |
| Endpointing / silence thresholds           | `min_turn_silence` (speculative end-of-turn) and `max_turn_silence` (forced end) — see [Turn detection](#turn-detection) |
| Custom vocabulary / keywords               | `keyterms_prompt=[...]`; broader domain context → `prompt`                                                               |
| Formatting / punctuation toggles           | On by default — formatted transcripts always (`format_turns` does not apply)                                             |
| Language(s)                                | 18 languages with code-switching; pin a language via `prompt`; `language_detection` adds language codes                  |

### VAD, interruptions, and infrastructure

| Today                           | Recommended                                                                                                                                                      |
| ------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Silero VAD                      | Keep it; set `activation_threshold=0.3` to match `vad_threshold`                                                                                                 |
| Adaptive vs. VAD-based barge-in | LiveKit Cloud → [adaptive interruption handling](#recommended-adaptive-interruption-handling); self-hosted → the [self-hosted filters](#self-hosted-alternative) |
| Telephony / SIP routing         | `sample_rate=8000` and `encoding="pcm_mulaw"` for 8 kHz telephony                                                                                                |
| Client-side noise cancellation  | Drop it; use server-side [Voice Focus](#voice-focus) instead                                                                                                     |
| LiveKit Cloud vs. self-hosting  | Cloud unlocks adaptive interruption; self-hosting uses the filters and version pinning                                                                           |

### Parameter cheat sheet

If you're coming from AssemblyAI's standard streaming model:

| Change                                                     | From                                       | To                                                                                       |
| ---------------------------------------------------------- | ------------------------------------------ | ---------------------------------------------------------------------------------------- |
| Model                                                      | `assemblyai.STT()`                         | `assemblyai.STT(model="universal-3-5-pro")`                                              |
| Turn detection                                             | `turn_detection="stt"` or `EnglishModel()` | `turn_detection="stt"` or `MultilingualModel()`                                          |
| VAD                                                        | Optional                                   | Set `vad=silero.VAD.load()` to match `vad_threshold`                                     |
| `min_turn_silence`                                         | `400` (old default)                        | `100` (new default)                                                                      |
| `max_turn_silence`                                         | `1280` (old default)                       | `1000` (API default) or `100` (with 3rd-party turn detector)                             |
| `end_of_turn_confidence_threshold`                         | Configurable                               | **Not applicable**: **Universal 3.5 Pro Realtime** uses punctuation-based turn detection |
| `endpointing.min_delay` (formerly `min_endpointing_delay`) | Default `0.5`                              | Set to `0` inside `TurnHandlingOptions` when using `turn_detection="stt"`                |

Migrating a production deployment? [Talk to our team](https://www.assemblyai.com/contact/sales).