Insights & Use Cases
April 2, 2026

Daily.co voice agent with AssemblyAI Universal-3 Pro Streaming

Build a WebRTC voice agent using Daily.co for real-time audio transport and the AssemblyAI Universal-3 Pro Streaming model for speech-to-text — without Pipecat.

Reviewed by
No items found.
Table of contents

Build a WebRTC voice agent using Daily.co for real-time audio transport and the AssemblyAI Universal-3 Pro Streaming model for speech-to-text — without Pipecat.

This is the bare-metal Daily.co integration. It shows exactly how Daily's audio tracks connect to the AssemblyAI WebSocket, which is useful when you want to embed a voice agent into a custom Daily.co application without pulling in a full pipeline framework.

Add real-time speech-to-text to your Daily.co app

Sign up for a free AssemblyAI account and wire Universal-3 Pro Streaming into your Daily.co room today.

Start building

Architecture

Browser / Phone (Daily.co room participant)
        │ WebRTC audio
  Daily.co room
        │ PCM audio via daily-python SDK
  This bot (daily-python)
        │ raw PCM bytes
  AssemblyAI Universal-3 Pro WebSocket
        │ transcript + neural turn signal
  OpenAI GPT-4o → Cartesia TTS → PCM audio
  Bot sends audio back into Daily room


Prerequisites

  • Python 3.11+
  • AssemblyAI API key
  • Daily.co API key
  • OpenAI API key
  • Cartesia API key

Quick start

git clone https://github.com/kelseyefoster/voice-agent-dailyco-universal-3-pro
cd voice-agent-dailyco-universal-3-pro

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env with your API keys

# Create a room and bot token
python create_room.py

# Start the bot
python bot.py --room-url https://yourname.daily.co/room --token <bot-token>

Open the room URL in your browser and start speaking.

How audio flows

Daily.co calls on_audio_data on your event handler whenever a remote participant speaks. The bot forwards those raw PCM bytes directly to the AssemblyAI WebSocket — no conversion needed at 16kHz

def on_audio_data(self, participant_id, audio_data, sample_rate, num_channels):
    if self.aai_ws and not self.aai_ws.closed:
        asyncio.create_task(self.aai_ws.send(audio_data))

When Universal-3 Pro detects an end-of-turn, the bot generates a response with GPT-4o, synthesizes audio with Cartesia, and injects it back into the room via client.send_audio().

AssemblyAI connection parameters

AAI_WS_URL = (
    "wss://streaming.assemblyai.com/v3/ws"
    "?speech_model=u3-rt-pro"
    "&encoding=pcm_s16le"
    f"&sample_rate={SAMPLE_RATE}"
    "&end_of_turn_confidence_threshold=0.4"
    "&min_turn_silence=300"
    f"&token={ASSEMBLYAI_API_KEY}"
)

Why direct Daily.co instead of Pipecat?

Pipecat is excellent for production voice agents with complex pipelines. This tutorial is for when you want the primitives — useful if you're embedding a voice agent into a custom Daily.co app and don't want the full framework overhead.

If you do want Pipecat's pipeline abstractions (VAD, context management, interruption handling), see Tutorial 02: Pipecat + Universal-3 Pro Streaming.

Resources

Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
AI voice agents
Universal-3 Pro Streaming