Build an Agora voice agent with AssemblyAI's Voice Agent API
Run a server-side bot that joins an Agora RTC channel, listens to the caller's microphone, and replies with synthesized speech — all powered by a single WebSocket to AssemblyAI's Voice Agent API. No separate STT, LLM, or TTS services to wire up.



Why combine Agora and the Voice Agent API
Agora gives you battle-tested WebRTC: low-latency audio routing across 200+ countries, automatic codec negotiation, jitter buffers, NAT traversal, and SDKs for every client platform. The Voice Agent API is the AI brain in one connection.
Architecture
The system has three layers:
The bot resamples between Agora's 16 kHz and the Voice Agent API's 24 kHz using SciPy's polyphase filter. Both sides use PCM16 mono.
Prerequisites
- Python 3.10+
- An Agora project with an App ID (and App Certificate if enabled)
- An AssemblyAI API key — free tier available
- Linux or macOS (the Agora native server SDK does not officially ship Windows wheels; use WSL2 or a Linux container on Windows)
Quick start
1. Clone and install
git clone https://github.com/kelsey-aai/voice-agent-agora
cd voice-agent-agora
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt2. Configure credentials
cp .env.example .envEdit .env:
ASSEMBLYAI_API_KEY=your_assemblyai_key
AGORA_APP_ID=your_agora_app_id
AGORA_APP_CERTIFICATE=your_agora_app_certificate
AGORA_CHANNEL=voice-agent-demo
AGORA_BOT_UID=9999If your Agora project has App Certificate disabled, leave AGORA_APP_CERTIFICATE blank.
3. Run the bot
python bot.py --channel voice-agent-demo4. Connect a client
Open Agora's Web demo, enter your App ID, the same channel name, a different UID, and click Join. Speak — the bot transcribes you live, the LLM replies, and the synthesized voice plays back through your browser.
How it works
The bridge is two cooperating asyncio tasks — one pulling caller audio out of Agora and pushing it to AssemblyAI, the other pulling reply audio out of AssemblyAI and pushing it back into Agora.
1. Connect to the Voice Agent API
URL = "wss://agents.assemblyai.com/v1/ws"
headers = {"Authorization": f"Bearer {API_KEY}"}
async with websockets.connect(URL, additional_headers=headers) as ws:
await ws.send(json.dumps({
"type": "session.update",
"session": {
"system_prompt": "You are a friendly voice assistant.",
"greeting": "Hi — I just joined the call.",
"input": {"format": {"encoding": "audio/pcm"}},
"output": {"voice": "ivy", "format": {"encoding": "audio/pcm"}},
},
}))
\\session.update is the first message and configures personality, greeting, and voice. The default audio format is audio/pcm — 24 kHz, 16-bit signed LE, mono.
2. Pull caller audio out of Agora
The bot registers an IAudioFrameObserver whose on_playback_audio_frame_before_mixing hook fires every 10 ms with one participant's audio frame. We resample 16 kHz → 24 kHz with SciPy's polyphase filter:
def on_playback_audio_frame_before_mixing(self, channel_id, uid, frame):
pcm16 = bytes(frame.buffer) # 16 kHz PCM16
pcm24 = resample_pcm16(pcm16, 16_000, 24_000)
loop.call_soon_threadsafe(agent.inbound_audio.put_nowait, pcm24)
return 0call_soon_threadsafe is required because Agora's observer runs on a native C++ thread, not the asyncio loop.
3. Stream audio to AssemblyAI
chunk = await mic_queue.get()
await ws.send(json.dumps({
"type": "input.audio",
"audio": base64.b64encode(chunk).decode(),
}))4. Publish the reply back into Agora
When reply.audio events arrive, we decode the base64 PCM, resample 24 kHz → 16 kHz, and hand it to AudioPcmDataSender:
elif t == "reply.audio":
pcm = base64.b64decode(event["data"])
await self.outbound_audio.put(pcm)pcm16 = resample_pcm16(pcm24, 24_000, 16_000)
self.pcm_sender.send_audio_pcm_data(
pcm16, 0, len(pcm16)//2, 2, 1, 16_000,
)We pace the pushes to wall-clock time so a long reply doesn't blast into Agora's buffer in one go — that keeps barge-in responsive.
5. Handle barge-in
elif t == "reply.done" and event.get("status") == "interrupted":
while not outbound_audio.empty():
outbound_audio.get_nowait()The Voice Agent API also trims the transcript.agent event to what the bot actually got out before it was cut off — useful for accurate logging.
Tuning
Pick a different voice
"output": {"voice": "james"} # conversational US male
"output": {"voice": "sophie"} # clear UK female
"output": {"voice": "diego"} # Latin American Spanish
"output": {"voice": "arjun"} # Hindi/HinglishBrowse the full Voices catalog. Multilingual voices code-switch with English automatically.
Adjust turn detection
"input": {
"turn_detection": {
"vad_threshold": 0.5,
"min_silence": 600,
"max_silence": 1500,
"interrupt_response": True,
}
}Boost domain-specific words
"input": {"keyterms": ["AssemblyAI", "Agora", "Universal-3"]}Add tools
Register functions on session.tools to let the agent look up data, hit APIs, or trigger workflows. Full pattern in the tool calling docs.
Troubleshooting
agora-python-server-sdk install fails on macOS. The package ships pre-built C++ wheels for Linux and macOS. If pip falls back to source build, install Xcode command-line tools (xcode-select --install) or run the bot in a Linux container.
Bot joins but stays silent. Check that your client connected with the same AGORA_CHANNEL name and a different UID than AGORA_BOT_UID. Agora rejects duplicate UIDs.
UNAUTHORIZED close from AssemblyAI. API key missing, expired, or wrong. Pull a fresh one from the AssemblyAI dashboard.
Audio sounds chipmunky or sluggish. Sample-rate mismatch. Confirm set_playback_audio_frame_before_mixing_parameters(channels=1, sample_rate_hz=16000) and that resampling is on between Agora's 16 kHz and the API's 24 kHz.
Bot interrupts itself. Acoustic loop somewhere — usually one client has speakers + mic open without echo cancellation. Browser clients should request getUserMedia({ audio: { echoCancellation: true } }).
Token errors from Agora. If your project has App Certificate enabled, AGORA_APP_CERTIFICATE must be set and the bot UID + channel name must match what you signed.
Full troubleshooting guide: Voice Agent API docs.
Known limitations
- agora-python-server-sdk is a beta wrapper around Agora's native C++ SDK. Class layouts have moved between minor versions. We pin 2.2.4 and document the exact API surface the bot uses.
- Agora's recommended path for new voice-agent projects is the Conversational AI Engine — a hosted REST service. Use this tutorial when you want the full AI pipeline on AssemblyAI's Voice Agent API.
No Windows wheels. Run inside WSL2 or a Linux Docker container.
Frequently asked questions
What is the AssemblyAI Voice Agent API?
A single WebSocket endpoint that handles the entire voice agent pipeline server-side — speech recognition on Universal-3 Pro Streaming, LLM reasoning, and TTS with 30+ voices. It includes neural turn detection, barge-in, and tool calling.
How do I connect the Voice Agent API to Agora?
Run a server-side bot with agora-python-server-sdk. The bot joins the Agora channel, registers an IAudioFrameObserver to capture caller audio (16 kHz PCM), resamples to 24 kHz, and forwards each chunk to the Voice Agent API. Reply audio comes back, gets resampled to 16 kHz, and is published via AudioPcmDataSender.
Can I use Agora's Conversational AI Engine instead?
Yes — it supports AssemblyAI as the STT provider, but uses Agora's LLM and TTS layers. Use this tutorial when you want the full AI pipeline on AssemblyAI's Voice Agent API.
What audio format does it use with Agora?
The Voice Agent API defaults to audio/pcm at 24 kHz. Agora delivers 16 kHz PCM, so the bot resamples 16 kHz ↔ 24 kHz on each side using SciPy's polyphase filter.
How does barge-in work?
The Voice Agent API emits reply.done with status: "interrupted". The bridge flushes its outbound audio queue so the bot stops talking immediately.
Do I need an Agora App Certificate?
Only if your Agora project has it enabled. If so, set AGORA_APP_CERTIFICATE in .env. If disabled, leave it blank.
How much does it cost?
AssemblyAI offers a free tier. For current pricing, see the AssemblyAI pricing page.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

