Streaming Keyterms Prompting

Supported models

Universal-3 Prou3-rt-pro

Universal-Streaming Multilingualuniversal-streaming-multilingual

Universal-Streaming Englishuniversal-streaming-english

The keyterms prompting feature helps improve recognition accuracy for specific words and phrases that are important to your use case. Keyterms prompting is supported for Universal-3 Pro, Universal-Streaming English, and Universal-Streaming Multilingual.

Start with no keyterms

We strongly recommend starting with no keyterms_prompt and then adding terms as needed based on important words for your use case that you are consistently seeing the model struggle with.

Including a large number of terms or common terms that are well represented in the training data could lead to overcorrections and hallucinations.

Keyterms Prompting costs an additional $0.04/hour.

Quickstart

Python

Python SDK

Javascript

JavaScript SDK

$ pip install websocket-client pyaudio

Python

Python SDK

Javascript

JavaScript SDK

1 import pyaudio
2 import websocket
3 import json
4 import threading
5 import time
6 from urllib.parse import urlencode
7 from datetime import datetime
8 
9 # --- Configuration ---
10 YOUR_API_KEY = "YOUR-API-KEY"  # Replace with your actual API key
11 
12 CONNECTION_PARAMS = {
13     "sample_rate": 16000,
14     "speech_model": "u3-rt-pro",
15     "keyterms_prompt": json.dumps(["Keanu Reeves", "AssemblyAI", "Universal-2"])
16 }
17 API_ENDPOINT_BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
18 API_ENDPOINT = f"{API_ENDPOINT_BASE_URL}?{urlencode(CONNECTION_PARAMS)}"
19 
20 # Audio Configuration
21 FRAMES_PER_BUFFER = 800  # 50ms of audio (0.05s * 16000Hz)
22 SAMPLE_RATE = CONNECTION_PARAMS["sample_rate"]
23 CHANNELS = 1
24 FORMAT = pyaudio.paInt16
25 
26 # Global variables for audio stream and websocket
27 audio = None
28 stream = None
29 ws_app = None
30 audio_thread = None
31 stop_event = threading.Event()  # To signal the audio thread to stop
32 
33 # --- WebSocket Event Handlers ---
34 
35 
36 def on_open(ws):
37     """Called when the WebSocket connection is established."""
38     print("WebSocket connection opened.")
39     print(f"Connected to: {API_ENDPOINT}")
40 
41     # Start sending audio data in a separate thread
42     def stream_audio():
43         global stream
44         print("Starting audio streaming...")
45         while not stop_event.is_set():
46             try:
47                 audio_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
48 
49                 # Send audio data as binary message
50                 ws.send(audio_data, websocket.ABNF.OPCODE_BINARY)
51             except Exception as e:
52                 print(f"Error streaming audio: {e}")
53                 # If stream read fails, likely means it's closed, stop the loop
54                 break
55         print("Audio streaming stopped.")
56 
57     global audio_thread
58     audio_thread = threading.Thread(target=stream_audio)
59     audio_thread.daemon = (
60         True  # Allow main thread to exit even if this thread is running
61     )
62     audio_thread.start()
63 
64 def on_message(ws, message):
65     try:
66         data = json.loads(message)
67         msg_type = data.get('type')
68 
69         if msg_type == "Begin":
70             session_id = data.get('id')
71             expires_at = data.get('expires_at')
72             print(f"\nSession began: ID={session_id}, ExpiresAt={datetime.fromtimestamp(expires_at)}")
73         elif msg_type == "Turn":
74             transcript = data.get('transcript', '')
75             if data.get('end_of_turn'):
76                 print('\r' + ' ' * 80 + '\r', end='')
77                 print(transcript)
78             else:
79                 print(f"\r{transcript}", end='')
80         elif msg_type == "Termination":
81             audio_duration = data.get('audio_duration_seconds', 0)
82             session_duration = data.get('session_duration_seconds', 0)
83             print(f"\nSession Terminated: Audio Duration={audio_duration}s, Session Duration={session_duration}s")
84     except json.JSONDecodeError as e:
85         print(f"Error decoding message: {e}")
86     except Exception as e:
87         print(f"Error handling message: {e}")
88 
89 def on_error(ws, error):
90     """Called when a WebSocket error occurs."""
91     print(f"\nWebSocket Error: {error}")
92     # Attempt to signal stop on error
93     stop_event.set()
94 
95 
96 def on_close(ws, close_status_code, close_msg):
97     """Called when the WebSocket connection is closed."""
98     print(f"\nWebSocket Disconnected: Status={close_status_code}, Msg={close_msg}")
99 
100     # Ensure audio resources are released
101     global stream, audio
102     stop_event.set()  # Signal audio thread just in case it's still running
103 
104     if stream:
105         if stream.is_active():
106             stream.stop_stream()
107         stream.close()
108         stream = None
109     if audio:
110         audio.terminate()
111         audio = None
112     # Try to join the audio thread to ensure clean exit
113     if audio_thread and audio_thread.is_alive():
114         audio_thread.join(timeout=1.0)
115 
116 
117 # --- Main Execution ---
118 def run():
119     global audio, stream, ws_app
120 
121     # Initialize PyAudio
122     audio = pyaudio.PyAudio()
123 
124     # Open microphone stream
125     try:
126         stream = audio.open(
127             input=True,
128             frames_per_buffer=FRAMES_PER_BUFFER,
129             channels=CHANNELS,
130             format=FORMAT,
131             rate=SAMPLE_RATE,
132         )
133         print("Microphone stream opened successfully.")
134         print("Speak into your microphone. Press Ctrl+C to stop.")
135     except Exception as e:
136         print(f"Error opening microphone stream: {e}")
137         if audio:
138             audio.terminate()
139         return  # Exit if microphone cannot be opened
140 
141     # Create WebSocketApp
142     ws_app = websocket.WebSocketApp(
143         API_ENDPOINT,
144         header={"Authorization": YOUR_API_KEY},
145         on_open=on_open,
146         on_message=on_message,
147         on_error=on_error,
148         on_close=on_close,
149     )
150 
151     # Run WebSocketApp in a separate thread to allow main thread to catch KeyboardInterrupt
152     ws_thread = threading.Thread(target=ws_app.run_forever)
153     ws_thread.daemon = True
154     ws_thread.start()
155 
156     try:
157         # Keep main thread alive until interrupted
158         while ws_thread.is_alive():
159             time.sleep(0.1)
160     except KeyboardInterrupt:
161         print("\nCtrl+C received. Stopping...")
162         stop_event.set()  # Signal audio thread to stop
163 
164         # Send termination message to the server
165         if ws_app and ws_app.sock and ws_app.sock.connected:
166             try:
167                 terminate_message = {"type": "Terminate"}
168                 print(f"Sending termination message: {json.dumps(terminate_message)}")
169                 ws_app.send(json.dumps(terminate_message))
170                 # Give a moment for messages to process before forceful close
171                 time.sleep(5)
172             except Exception as e:
173                 print(f"Error sending termination message: {e}")
174 
175         # Close the WebSocket connection (will trigger on_close)
176         if ws_app:
177             ws_app.close()
178 
179         # Wait for WebSocket thread to finish
180         ws_thread.join(timeout=2.0)
181 
182     except Exception as e:
183         print(f"\nAn unexpected error occurred: {e}")
184         stop_event.set()
185         if ws_app:
186             ws_app.close()
187         ws_thread.join(timeout=2.0)
188 
189     finally:
190         # Final cleanup (already handled in on_close, but good as a fallback)
191         if stream and stream.is_active():
192             stream.stop_stream()
193         if stream:
194             stream.close()
195         if audio:
196             audio.terminate()
197         print("Cleanup complete. Exiting.")
198 
199 
200 if __name__ == "__main__":
201     run()

Configuration

To utilize keyterms prompting, you need to include your desired keyterms as query parameters in the WebSocket URL.

You can include a maximum of 100 keyterms per session.
Each individual keyterm string must be 50 characters or less in length.

How it works

Streaming Keyterms Prompting has two components to improve accuracy for your terms.

Word-level boosting

The streaming model itself is biased during inference to be more accurate at identifying words from your keyterms list. This happens in real-time as words are emitted during the streaming process, providing immediate improvements to recognition accuracy. This component is enabled by default.

Turn-level boosting

After each turn is completed, an additional boosting pass analyzes the full transcript using your keyterms list. This post-processing step provides a second layer of accuracy improvement by examining the complete context of the turn. Turn-level boosting requires format_turns=true to be enabled.

For Universal-3 Pro (u3-rt-pro), turn-level boosting is always active. For Universal-Streaming English and Universal-Streaming Multilingual, turn-level boosting is only active when format_turns=true.

Both stages work together to maximize recognition accuracy for your keyterms throughout the streaming process.

Dynamic keyterms prompting

Dynamic keyterms prompting allows you to update keyterms during an active streaming session using the UpdateConfiguration message. This enables you to adapt the recognition context in real-time based on conversation flow or changing requirements.

Updating keyterms during a session

To update keyterms while streaming, send an UpdateConfiguration message with a new keyterms_prompt array:

Python

Python SDK

Javascript

JavaScript SDK

1 # Replace or establish new set of keyterms
2 websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": ["Universal-3"]}')
3 
4 # Remove keyterms and reset context biasing
5 websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": []}')

How dynamic keyterms work

When you send an UpdateConfiguration message:

Replacing keyterms: Providing a new array of keyterms completely replaces the existing set. The new keyterms take effect immediately for subsequent audio processing.
Clearing keyterms: Sending an empty array [] removes all keyterms and resets context biasing to the default state.
Both boosting stages: Dynamic keyterms work with both word-level boosting (native context biasing) and turn-level boosting (metaphone-based), just like initial keyterms.

Use cases for dynamic keyterms

Dynamic keyterms are particularly useful for:

Context-aware voice agents: Update keyterms based on conversation stage (e.g., switching from menu items to payment terms)
Multi-topic conversations: Adapt vocabulary as the conversation topic changes
Progressive disclosure: Add relevant keyterms as new information becomes available
Cleanup: Remove keyterms that are no longer relevant to reduce processing overhead

Important notes

Keyterms prompts longer than 50 characters are ignored.
Requests containing more than 100 keyterms will result in an error.

Best practices

To maximize the effectiveness of keyterms prompting:

Specify Unique Terminology: Include proper names, company names, technical terms, or vocabulary specific to your domain that might not be commonly recognized.
Exact Spelling and Capitalization: Provide keyterms with the precise spelling and capitalization you expect to see in the output transcript. This helps the system accurately identify the terms.
Avoid Common Words: Do not include single, common English words (e.g., “information”) as keyterms. The system is generally proficient with such words, and adding them as keyterms can be redundant.