> ## Documentation Index
> Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Prompting and Keyterms

export const ModelBadges = ({models}) => {
  return <div className="flex flex-wrap gap-2 -mt-3 mb-3 not-prose">
      {models.map(model => <span key={model} className="inline-flex items-center rounded-full bg-green-500/15 px-2.5 py-0.5 text-xs font-mono text-green-700 dark:text-green-400 ring-1 ring-inset ring-green-500/30">
          {model}
        </span>)}
    </div>;
};

<ModelBadges models={["universal-3-5-pro", "u3-rt-pro", "universal-streaming-english", "universal-streaming-multilingual"]} />

Use contextual prompting and keyterms to improve streaming transcription accuracy.

Streaming transcription supports two complementary ways to give the model information about your audio:

* **Contextual prompting** (`prompt`) — a natural-language description of what the audio is about: the domain, the scenario, or the full details of the conversation.
* **Keyterms prompting** (`keyterms_prompt`) — an explicit list of terms you want the model to recognize accurately.

Both improve recognition accuracy for your use case. Neither changes the output format.

## How prompting works

Universal-3.5 Pro is trained to use context about the audio — its domain, topic, or scenario — to better recognize the vocabulary that context makes likely. A call described as a cardiology consultation primes the model for medical terminology; a call described as an order-status check primes it for order IDs and product names.

The transcription instruction itself is built in and managed by AssemblyAI. You don't need to tell the model *how* to transcribe — verbatim behavior, punctuation, and formatting are already optimized for streaming and turn detection. The `prompt` parameter carries **context about your audio**, not instructions: formatting or behavioral commands in the prompt (such as punctuation rules) are not supported.

The model is trained to stay grounded in the audio: context that turns out to be irrelevant or only partially applicable does not cause the model to insert words that weren't spoken. This means you can safely send the same context with every session of a longer interaction — for example, the same call description for every segment of a conversation — even though only some of it applies at any given moment.

<Tip>
  **Start with no prompt**

  We strongly recommend testing with no `prompt` first. Universal-3.5 Pro is optimized out of the box, and context helps most when your audio contains domain-specific vocabulary the model is getting wrong. Add context when you see those errors, starting with the broadest level that describes your use case.
</Tip>

## Contextual prompting

Contextual prompts work at three levels of specificity. Use the least specific level that covers your use case, and add detail when your audio contains uncommon names or terms the model can't otherwise know.

| Level        | Length      | What it contains                                            | Example                                                                                                                                                     |
| ------------ | ----------- | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Domain**   | 2–5 words   | The domain only                                             | `Medical consultation call.`                                                                                                                                |
| **Scenario** | 5–15 words  | What the conversation is about                              | `Cardiology consultation about chest pain symptoms.`                                                                                                        |
| **Detailed** | 20–50 words | Full description, including names, products, or identifiers | `Cardiology consultation between Dr. Smith and an elderly patient regarding recurring chest pain, ECG results, and medication adjustment for hypertension.` |

**Domain context** tells the model what field the audio belongs to — medical, legal, technical support, food ordering. This is the safest starting point and is often enough to fix vocabulary errors.

**Scenario context** describes the specific situation. This is the right level for most applications that know what kind of call is taking place — appointment booking, billing inquiry, delivery complaint.

**Detailed context** describes everything your application already knows about the conversation: participant names, account or order identifiers, products, locations. This level is the most powerful when the audio contains proper nouns and identifiers — for example, a voice agent platform that already knows who is calling and why can pass that information so the model spells names and IDs correctly.

Guidelines for writing contextual prompts:

* Write plain, complete sentences that describe the audio — you are describing a recording, not commanding the model.
* Keep it to one short block of text. Don't pack lists of keywords into the contextual prompt — that's what [`keyterms_prompt`](#keyterms-prompting) is for.
* Specificity follows knowledge: only include details you actually know about the call. Wrong details won't corrupt the transcript, but they don't help either.

### Accuracy impact

We benchmarked contextual prompting on 20,000 real voice-agent calls at each of the three levels. Accuracy improves monotonically with context specificity, and the gains are largest for the entities each level describes. Word error rate (WER) and entity error rate (EER) below are relative reductions versus no prompt.

| Improvement vs. no prompt         | Domain | Scenario | Detailed |
| --------------------------------- | ------ | -------- | -------- |
| Word error rate (WER)             | −5%    | −10%     | −21%     |
| Hallucinated words                | −9%    | −12%     | −19%     |
| Entity error rate — overall       | −2%    | −7%      | −29%     |
| Entity error rate — names         | −5%    | −16%     | −49%     |
| Entity error rate — places        | −9%    | −21%     | −44%     |
| Entity error rate — medical terms | −2%    | −24%     | −43%     |

Two takeaways for choosing a level:

* **Scenario context is the practical default.** It needs only the kind of information any application already has — the type of call — and already cuts overall WER \~10% and entity errors on names and places by 16–21%.
* **Detailed context is the upper bound.** When your application can supply specifics it already knows (caller name, account or order IDs, products), entity accuracy improves dramatically — names nearly halve. This is the most powerful option for voice agents with access to customer context.

Across all levels, turn detection is unaffected. Hallucinated and fabricated words go *down* with more context — describing the audio makes the model less likely to invent content, not more.

### Specifying the language

<Tip>
  **Prefer the `language_code` parameter**

  Language selection is best supported through the `language_code` connection parameter — see [Language selection](/streaming/getting-started/optimizing-accuracy-and-latency#language-selection). You can also specify the language in natural language as part of your contextual prompt, as described below.
</Tip>

State the language of the audio as part of your context. For example, knowing the language disambiguates the audio: the same sound could be transcribed "si" in a Spanish call but "C" in an English one.

```
Spanish customer support call about a billing inquiry.
```

For multilingual audio, name all expected languages:

```
Multilingual conversation in English, Spanish, and German.
```

## Keyterms prompting

Use the `keyterms_prompt` parameter to boost recognition of specific names, brands, or domain terms. Pass an array of terms you want the model to prioritize:

```
keyterms_prompt=["Keanu Reeves", "AssemblyAI", "Universal-2"]
```

Keyterms prompting is the right tool when you have an explicit vocabulary list — contact names, product catalogs, medical terms — rather than a description of the conversation. Like contextual prompting, it is trained to be robust to terms that don't end up being spoken, so you can pass your full list to every session.

<Tip>
  **Start with no keyterms**

  **We strongly recommend starting with no `keyterms_prompt` and then adding terms as needed** based on important words for your use case that you are consistently seeing the model struggle with.

  Including a large number of terms or common terms that are well represented in the training data could lead to overcorrections and hallucinations.
</Tip>

**Limits:**

* You can include a maximum of **100 keyterms** per session.
* Each individual keyterm string must be **50 characters or less**.
* Keyterms longer than 50 characters are ignored; requests with more than 100 keyterms return an error.

**Best practices:**

* **Specify unique terminology.** Include proper names, company names, technical terms, or vocabulary specific to your domain that might not be commonly recognized.
* **Exact spelling and capitalization.** Provide keyterms with the precise spelling and capitalization you expect to see in the output transcript.
* **Avoid common words.** Do not include single, common English words (e.g., "information") as keyterms. The system is generally proficient with such words, and adding them as keyterms can be redundant.

## Quickstart

Set `prompt` and `keyterms_prompt` as connection parameters when you open the WebSocket. The example below uses both together, with `prompt` describing the conversation and `keyterms_prompt` enumerating specific terms.

<Tabs>
  <Tab language="python" title="Python" default>
    ```python theme={null}
    import json

    CONNECTION_PARAMS = {
        "sample_rate": 16000,
        "speech_model": "universal-3-5-pro",
        "prompt": "Cardiology consultation about chest pain symptoms.",
        "keyterms_prompt": json.dumps(["Dr. Smith", "AssemblyAI", "ECG"]),
    }
    ```
  </Tab>

  <Tab language="python-sdk" title="Python SDK">
    ```python theme={null}
    client.connect(
        StreamingParameters(
            sample_rate=16000,
            speech_model="universal-3-5-pro",
            prompt="Cardiology consultation about chest pain symptoms.",
            keyterms_prompt=["Dr. Smith", "AssemblyAI", "ECG"],
        )
    )
    ```
  </Tab>

  <Tab language="javascript" title="Javascript">
    ```javascript theme={null}
    const CONNECTION_PARAMS = {
      sample_rate: 16000,
      speech_model: "universal-3-5-pro",
      prompt: "Cardiology consultation about chest pain symptoms.",
      keyterms_prompt: JSON.stringify(["Dr. Smith", "AssemblyAI", "ECG"]),
    };
    ```
  </Tab>

  <Tab language="javascript-sdk" title="JavaScript SDK">
    ```javascript theme={null}
    const transcriber = client.streaming.transcriber({
      sampleRate: 16_000,
      speechModel: "universal-3-5-pro",
      prompt: "Cardiology consultation about chest pain symptoms.",
      keytermsPrompt: ["Dr. Smith", "AssemblyAI", "ECG"],
    });
    ```
  </Tab>
</Tabs>

## Updating mid-stream

Both `prompt` and `keyterms_prompt` can be updated during an active streaming session without reconnecting. This is useful when your application learns more about the conversation as it progresses, or when a voice agent moves between conversation stages.

Send an `UpdateConfiguration` message with the new values:

<Tabs>
  <Tab language="python" title="Python" default>
    ```python theme={null}
    # Update the contextual prompt as the conversation evolves
    websocket.send('{"type": "UpdateConfiguration", "prompt": "Now collecting payment details."}')

    # Replace or establish a new set of keyterms
    websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": ["Universal-3"]}')

    # Remove keyterms and reset context biasing
    websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": []}')
    ```
  </Tab>

  <Tab language="python-sdk" title="Python SDK">
    ```python theme={null}
    # Update the contextual prompt as the conversation evolves
    client.update_configuration(prompt="Now collecting payment details.")

    # Replace or establish a new set of keyterms
    client.update_configuration(keyterms_prompt=["Universal-3"])

    # Remove keyterms and reset context biasing
    client.update_configuration(keyterms_prompt=[])
    ```
  </Tab>

  <Tab language="javascript" title="Javascript">
    ```javascript theme={null}
    // Update the contextual prompt as the conversation evolves
    websocket.send(
      '{"type": "UpdateConfiguration", "prompt": "Now collecting payment details."}'
    );

    // Replace or establish a new set of keyterms
    websocket.send(
      '{"type": "UpdateConfiguration", "keyterms_prompt": ["Universal-3"]}'
    );

    // Remove keyterms and reset context biasing
    websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": []}');
    ```
  </Tab>

  <Tab language="javascript-sdk" title="JavaScript SDK">
    ```javascript theme={null}
    // Update the contextual prompt as the conversation evolves
    transcriber.updateConfiguration({ prompt: "Now collecting payment details." });

    // Replace or establish a new set of keyterms
    transcriber.updateConfiguration({ keytermsPrompt: ["Universal-3"] });

    // Remove keyterms and reset context biasing
    transcriber.updateConfiguration({ keytermsPrompt: [] });
    ```
  </Tab>
</Tabs>

Providing a new keyterms array completely replaces the existing set; sending an empty array `[]` removes all keyterms and resets context biasing to the default state. New values take effect immediately for subsequent audio processing.

See [Updating configuration mid-stream](/streaming/updating-configuration-mid-stream) for the full list of parameters you can update mid-stream.

## What prompting is not

* **Prompting does not control output formatting.** Instructions in the prompt (punctuation rules, verbatim directives, formatting commands) are not supported — transcription behavior is managed internally and already optimized for streaming and turn detection.

## Looking for async prompting?

Prompting behavior differs between streaming and async (pre-recorded) use cases. This guide covers prompting for **streaming (real-time audio)**. If you're working with pre-recorded audio, see the [Prompting Guide (Async)](/pre-recorded-audio/universal-3-pro/prompting) and [Keyterms prompting (Async)](/pre-recorded-audio/universal-3-pro/prompting#keyterms-prompting).
