Streaming Migration Guide: Gladia to AssemblyAI

This guide walks through the process of migrating from Gladia to AssemblyAI for transcribing streaming audio.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your AssemblyAI dashboard.

Side-By-Side Code Comparison

Below is a side-by-side comparison of a basic snippet to transcribe live audio by Gladia and AssemblyAI using a microphone:

1import asyncio
2import base64
3import json
4import signal
5from datetime import time
6import pyaudio
7import requests
8from websockets.asyncio.client import connect
9from websockets.exceptions import ConnectionClosedOK
10
11# Constants
12GLADIA_API_KEY = "<YOUR_GLADIA_API_KEY>"
13GLADIA_API_URL = "https://api.gladia.io"
14
15# Audio configuration
16CHANNELS = 1
17FORMAT = pyaudio.paInt16
18FRAMES_PER_BUFFER = 3200
19SAMPLE_RATE = 16_000
20
21async def main():
22 # Initialize the session
23 config = {
24 "encoding": "wav/pcm",
25 "sample_rate": SAMPLE_RATE,
26 "bit_depth": 16,
27 "channels": CHANNELS,
28 "language_config": {
29 "languages": ["en"],
30 },
31 }
32
33 # Start the live session
34 response = requests.post(
35 f"{GLADIA_API_URL}/v2/live",
36 headers={"X-Gladia-Key": GLADIA_API_KEY},
37 json=config,
38 timeout=3,
39 )
40
41 if not response.ok:
42 print(f"{response.status_code}: {response.text or response.reason}")
43 exit(response.status_code)
44
45 session_data = response.json()
46
47 # Connect to the websocket
48 async with connect(session_data["url"]) as websocket:
49 print("\n################ Begin session ################\n")
50
51 # Handle Ctrl+C to stop recording
52 loop = asyncio.get_running_loop()
53 loop.add_signal_handler(
54 signal.SIGINT,
55 lambda: loop.create_task(stop_recording(websocket)),
56 )
57
58 # Create tasks for sending audio and receiving transcripts
59 send_audio_task = asyncio.create_task(send_audio(websocket))
60 receive_transcript_task = asyncio.create_task(receive_transcript(websocket))
61
62 await asyncio.wait([send_audio_task, receive_transcript_task])
63
64async def stop_recording(websocket):
65 print(">>>>> Ending the recording…")
66 await websocket.send(json.dumps({"type": "stop_recording"}))
67 await asyncio.sleep(0)
68
69async def send_audio(websocket):
70 # Initialize PyAudio
71 p = pyaudio.PyAudio()
72
73 # Open audio stream
74 stream = p.open(
75 format=FORMAT,
76 channels=CHANNELS,
77 rate=SAMPLE_RATE,
78 input=True,
79 frames_per_buffer=FRAMES_PER_BUFFER,
80 )
81
82 # Send audio chunks
83 while True:
84 data = stream.read(FRAMES_PER_BUFFER)
85 data = base64.b64encode(data).decode("utf-8")
86 json_data = json.dumps({"type": "audio_chunk", "data": {"chunk": str(data)}})
87 try:
88 await websocket.send(json_data)
89 await asyncio.sleep(0.1) # Send audio every 100ms
90 except ConnectionClosedOK:
91 return
92
93async def receive_transcript(websocket):
94 # Process incoming messages
95 async for message in websocket:
96 content = json.loads(message)
97
98 # Print transcripts
99 if content["type"] == "transcript" and content["data"]["is_final"]:
100 text = content["data"]["utterance"]["text"].strip()
101 print(f"Final: {text}")
102
103 # Print final results
104 if content["type"] == "post_final_transcript":
105 print("\n################ End of session ################\n")
106 print(json.dumps(content, indent=2, ensure_ascii=False))
107
108if __name__ == "__main__":
109 asyncio.run(main())

Authentication

1import asyncio
2import base64
3import json
4import signal
5from datetime import time
6import pyaudio
7import requests
8from websockets.asyncio.client import connect
9from websockets.exceptions import ConnectionClosedOK
10
11GLADIA_API_KEY = "<YOUR_GLADIA_API_KEY>"
Protect Your API Key

For improved security, store your API key as an environment variable.

Connection Parameters & Microphone Setup

1GLADIA_API_URL = "https://api.gladia.io"
2
3CHANNELS = 1
4FORMAT = pyaudio.paInt16
5FRAMES_PER_BUFFER = 3200
6SAMPLE_RATE = 16_000
7
8config = {
9 "encoding": "wav/pcm",
10 "sample_rate": SAMPLE_RATE,
11 "bit_depth": 16,
12 "channels": CHANNELS,
13 "language_config": {
14 "languages": ["en"]
15 },
16}

Helpful information about our streaming model:

  • Universal-3 Pro Model — Connect to wss://streaming.assemblyai.com/v3/ws with speech_model=u3-rt-pro to use our latest, highest-accuracy streaming model — Universal-3 Pro.

  • Built-in Formatting — Universal-3 Pro always returns formatted transcripts with smart punctuation & casing. No extra parameter is needed.

  • Partial Transcripts — AssemblyAI streams interim results automatically. Universal-3 Pro emits partials during periods of silence, with at most one partial per silence period.

Open Microphone Stream & Create WebSocket

1# Start the live session
2response = requests.post(
3 f"{GLADIA_API_URL}/v2/live",
4 headers={"X-Gladia-Key": GLADIA_API_KEY},
5 json=config,
6 timeout=3,
7)
8
9if not response.ok:
10 print(f"{response.status_code}: {response.text or response.reason}")
11 exit(response.status_code)
12
13session_data = response.json()
14
15async def send_audio(websocket):
16 # Initialize PyAudio
17 p = pyaudio.PyAudio()
18
19 # Open audio stream
20 stream = p.open(
21 format=FORMAT,
22 channels=CHANNELS,
23 rate=SAMPLE_RATE,
24 input=True,
25 frames_per_buffer=FRAMES_PER_BUFFER,
26 )
27
28 # Send audio chunks
29 while True:
30 data = stream.read(FRAMES_PER_BUFFER)
31 data = base64.b64encode(data).decode("utf-8")
32 json_data = json.dumps({"type": "audio_chunk", "data": {"chunk": str(data)}})
33 try:
34 await websocket.send(json_data)
35 await asyncio.sleep(0.1) # Send audio every 100ms
36 except ConnectionClosedOK:
37 return

Open WebSocket

1# Connect to Websocket
2async with connect(session_data["url"]) as websocket:
3 print("\n################ Begin session ################\n")
4
5 # Create tasks for sending audio and receiving transcripts
6 send_audio_task = asyncio.create_task(send_audio(websocket))
7 receive_transcript_task = asyncio.create_task(receive_transcript(websocket))
8
9 await asyncio.wait([send_audio_task, receive_transcript_task])

Receive Messsages from WebSocket

1async def receive_transcript(websocket):
2 # Process incoming messages
3 async for message in websocket:
4 content = json.loads(message)
5
6 # Print transcripts
7 if content["type"] == "transcript" and content["data"]["is_final"]:
8 text = content["data"]["utterance"]["text"].strip()
9 print(f"Final: {text}")
10
11 # Print final results
12 if content["type"] == "post_final_transcript":
13 print("\n################ End of session ################\n")
14 print(json.dumps(content, indent=2, ensure_ascii=False))

Helpful information about AssemblyAI’s message payloads:

  • Clear Message Types – Instead of checking is_final, you’ll receive explicit "Begin", "Turn", and "Termination" events, making your logic simpler and more readable.

  • Session Metadata Up-Front – The first "Begin" message delivers a session_id and expiry timestamp so you can immediately log or surface these for tracing or billing.

  • End-of-Turn Detection – Each "Turn" object includes an end_of_turn boolean. When end_of_turn is true, the transcript is a final, formatted result. When false, it is a partial transcript. Universal-3 Pro always returns formatted transcripts with smart punctuation & casing built in.

Close the WebSocket

1async def stop_recording(websocket):
2 print(">>>>> Ending the recording…")
3 await websocket.send(json.dumps({"type": "stop_recording"}))
4 await asyncio.sleep(0)

Helpful information about AssemblyAI’s WebSocket Closure:

  • Connection Diagnostics - If the socket closes unexpectedly, AssemblyAI supplies both a status code and a reason message (close_status_code, close_msg), so you know immediately whether the server timed out, refused authentication, or encountered a different error.

Session Shutdown

1async with connect(session_data["url"]) as websocket:
2 ...
3 # Handle Ctrl+C to stop recording
4 loop = asyncio.get_running_loop()
5 loop.add_signal_handler(
6 signal.SIGINT,
7 lambda: loop.create_task(stop_recording(websocket)),
8 )

Helpful information to know about AssemblyAI’s shutdown:

  • JSON Payload Difference - When closing the stream with AssemblyAI, your JSON payload will be { "type": "Terminate" } instead of { "type": "stop_recording" }.

  • No Metadata Race Condition - AssemblyAI provides session info at “Begin” and doesn’t append extra data at shutdown, making the exit faster and less error-prone.

Resources

For additional information about using AssemblyAI’s Streaming Speech-To-Text API you can also refer to: