Insights & Use Cases
March 31, 2026

LiveKit voice agent with AssemblyAI Universal-3 Pro Streaming

Build a production-ready LiveKit voice agent using AssemblyAI Universal-3 Pro Streaming. 307ms P50 latency, neural turn detection, and anti-hallucination — in Python.

Reviewed by
No items found.
Table of contents

Build a production-ready real-time voice agent using LiveKit Agents and the AssemblyAI Universal-3 Pro Streaming model (`u3-rt-pro`). This is the fastest path from zero to a deployed Voice AI agent — and the combination that gives you the best speech-to-text latency available today.

Why Universal-3 Pro Streaming?

307ms P50 latency. That's what separates a voice agent that feels natural from one that feels broken.

MetricAssemblyAI Universal-3 ProDeepgram Nova-3
P50 latency307 ms516 ms
P99 latency1,012 ms1,907 ms
Word Error Rate8.14%9.87%
Neural turn detection❌ (VAD only)
Mid-session prompting
Anti-hallucination
Alphanumeric accuracy+21% fewer errorsbaseline

*Benchmarks from Hamming.ai across 4M+ production calls.*

The turn detection difference is significant. Instead of silence-based VAD, Universal-3 Pro uses acoustic and linguistic signals together — so it knows the difference between a pause mid-sentence and an actual end-of-turn. Fewer false triggers, snappier response.

Build your first LiveKit voice agent

Sign up for a free AssemblyAI account and start building with Universal-3 Pro Streaming today. No credit card required.

Start building

Architecture

│  WebRTC (LiveKit room)
     ▼
LiveKit Cloud ──► AssemblyAI Universal-3 Pro Streaming (speech-to-text)
                       │ transcript + neural turn signal
                       ▼
                  OpenAI GPT-4o (LLM)
                       │ text response
                       ▼
                  Cartesia Sonic (TTS)
                       │ audio
                       ▼
              Back to LiveKit roo

Prerequisites

- Python 3.11+
- AssemblyAI API key — free tier available
- LiveKit Cloud account — free tier available
- OpenAI API key
- Cartesia API key

Quick start

1. Clone and install

git clone https://github.com/kelseyefoster/voice-agent-livekit-universal-3-pro
cd voice-agent-livekit-universal-3-pro

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Edit .env with your API keys


3. Download plugin models

python agent.py download-files

4. Run locally

# Console mode — speak directly from your terminal
python agent.py console

# Dev mode — connects to LiveKit Cloud, open agents-playground.livekit.io
python agent.py dev

Open agents-playground.livekit.io, enter your LiveKit URL and API key, and start talking.

Tuning Universal-3 Pro Streaming

The three turn detection parameters give you a lot of control over how responsive vs. patient the agent feels:

stt=assemblyai.STT(
    model="u3-rt-pro",

    # How confident the model needs to be before declaring turn end (0.01.0)
    # Lower = faster response; higher = fewer false triggers on noisy lines
    end_of_turn_confidence_threshold=0.4,

    # Silence (ms) before the speculative end-of-turn check fires
    min_turn_silence=300,

    # Hard ceiling — force turn end after this much silence regardless
    max_turn_silence=1200,
)

**For noisy environments** (call centers, mobile): raise `end_of_turn_confidence_threshold` to `0.6`

**For fast-paced conversation**: lower `min_turn_silence` to `200`

**For healthcare or deliberate speech**: raise `max_turn_silence` to `2000`

Enabling keyterm prompting

Boost recognition accuracy for domain-specific vocabulary mid-session — no restart required:

# After session.start():
await session.stt.update_options(
    keyterms_prompt=["YourBrandName", "SpecialProduct", "TechnicalTerm"]

Up to 1,000 terms, each up to 50 characters. This is especially useful for medical terminology, product names, and financial jargon.

Enabling real-time speaker diarization

stt=assemblyai.STT(
    model="u3-rt-pro",
    speaker_labels=True,
    max_speakers=2,  # e.g., interviewer + candidate, agent + customer
)

Swapping components

The LiveKit Agents plugin system makes it straightforward to swap any component:

# Different LLM
from livekit.plugins import anthropic
llm=anthropic.LLM(model="claude-opus-4-6")

# Different TTS
from livekit.plugins import elevenlabs
tts=elevenlabs.TTS(voice_id="your_voice_id")

# Groq for ultra-low-latency LLM inference
llm=openai.LLM.with_groq(model="llama-3.3-70b-versatile")

Deploy to Fly.io

fly launch --no-deploy
fly secrets set \
  ASSEMBLYAI_API_KEY=your_key \
  OPENAI_API_KEY=your_key \
  CARTESIA_API_KEY=your_key \
  LIVEKIT_URL=wss://... \
  LIVEKIT_API_KEY=your_key \
  LIVEKIT_API_SECRET=your_secret
fly deploy

Resources

- AssemblyAI Universal Streaming docs
- LiveKit Agents docs
- AssemblyAI LiveKit integration guide

Experiment with real-time turn detection

Try streaming transcription in our Playground and observe how punctuation and silence handling shape turn boundaries in real time.

Open playground
Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
AI voice agents
Universal-3 Pro Streaming