March 31, 2026

LiveKit voice agent with AssemblyAI Universal-3 Pro Streaming

Build a production-ready LiveKit voice agent using AssemblyAI Universal-3 Pro Streaming. 307ms P50 latency, neural turn detection, and anti-hallucination — in Python.

Kelsey Foster

Growth

AI voice agents

Universal-3 Pro Streaming

Reviewed by

Table of contents

[Visible on live site]

Build a production-ready real-time voice agent using LiveKit Agents and the AssemblyAI Universal-3 Pro Streaming model (`u3-rt-pro`). This is the fastest path from zero to a deployed Voice AI agent — and the combination that gives you the best speech-to-text latency available today.

Why Universal-3 Pro Streaming?

307ms P50 latency. That's what separates a voice agent that feels natural from one that feels broken.

Metric	AssemblyAI Universal-3 Pro	Deepgram Nova-3
P50 latency	307 ms	516 ms
P99 latency	1,012 ms	1,907 ms
Word Error Rate	8.14%	9.87%
Neural turn detection	✅	❌ (VAD only)
Mid-session prompting	✅	❌
Anti-hallucination	✅	❌
Alphanumeric accuracy	+21% fewer errors	baseline

*Benchmarks from Hamming.ai across 4M+ production calls.*

The turn detection difference is significant. Instead of silence-based VAD, Universal-3 Pro uses acoustic and linguistic signals together — so it knows the difference between a pause mid-sentence and an actual end-of-turn. Fewer false triggers, snappier response.

Build your first LiveKit voice agent

Start building

Architecture

│ WebRTC (LiveKit room)
▼
LiveKit Cloud ──► AssemblyAI Universal-3 Pro Streaming (speech-to-text)
│ transcript + neural turn signal
▼
OpenAI GPT-4o (LLM)
│ text response
▼
Cartesia Sonic (TTS)
│ audio
▼
Back to LiveKit roo

Prerequisites

- Python 3.11+
- AssemblyAI API key — free tier available
- LiveKit Cloud account — free tier available
- OpenAI API key
- Cartesia API key

Quick start

1. Clone and install

git clone https://github.com/kelseyefoster/voice-agent-livekit-universal-3-pro
cd voice-agent-livekit-universal-3-pro

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Edit .env with your API keys

‍

3. Download plugin models

python agent.py download-files

4. Run locally

# Console mode — speak directly from your terminal
python agent.py console

# Dev mode — connects to LiveKit Cloud, open agents-playground.livekit.io
python agent.py dev

Open agents-playground.livekit.io, enter your LiveKit URL and API key, and start talking.

Tuning Universal-3 Pro Streaming

The three turn detection parameters give you a lot of control over how responsive vs. patient the agent feels:

stt=assemblyai.STT(
    model="u3-rt-pro",

    # How confident the model needs to be before declaring turn end (0.0–1.0)
    # Lower = faster response; higher = fewer false triggers on noisy lines
    end_of_turn_confidence_threshold=0.4,

    # Silence (ms) before the speculative end-of-turn check fires
    min_turn_silence=300,

    # Hard ceiling — force turn end after this much silence regardless
    max_turn_silence=1200,
)

**For noisy environments** (call centers, mobile): raise `end_of_turn_confidence_threshold` to `0.6`

**For fast-paced conversation**: lower `min_turn_silence` to `200`

**For healthcare or deliberate speech**: raise `max_turn_silence` to `2000`

Enabling keyterm prompting

Boost recognition accuracy for domain-specific vocabulary mid-session — no restart required:

# After session.start():
await session.stt.update_options(
    keyterms_prompt=["YourBrandName", "SpecialProduct", "TechnicalTerm"]

Up to 1,000 terms, each up to 50 characters. This is especially useful for medical terminology, product names, and financial jargon.

Enabling real-time speaker diarization

stt=assemblyai.STT(
    model="u3-rt-pro",
    speaker_labels=True,
    max_speakers=2,  # e.g., interviewer + candidate, agent + customer
)

Swapping components

The LiveKit Agents plugin system makes it straightforward to swap any component:

# Different LLM
from livekit.plugins import anthropic
llm=anthropic.LLM(model="claude-opus-4-6")

# Different TTS
from livekit.plugins import elevenlabs
tts=elevenlabs.TTS(voice_id="your_voice_id")

# Groq for ultra-low-latency LLM inference
llm=openai.LLM.with_groq(model="llama-3.3-70b-versatile")

Deploy to Fly.io

fly launch --no-deploy
fly secrets set \
  ASSEMBLYAI_API_KEY=your_key \
  OPENAI_API_KEY=your_key \
  CARTESIA_API_KEY=your_key \
  LIVEKIT_URL=wss://... \
  LIVEKIT_API_KEY=your_key \
  LIVEKIT_API_SECRET=your_secret
fly deploy

Resources

- AssemblyAI Universal Streaming docs
- LiveKit Agents docs
- AssemblyAI LiveKit integration guide

Experiment with real-time turn detection

Try streaming transcription in our Playground and observe how punctuation and silence handling shape turn boundaries in real time.

Open playground

LiveKit voice agent with AssemblyAI Universal-3 Pro Streaming

Why Universal-3 Pro Streaming?

Architecture

Prerequisites

Quick start

1. Clone and install

2. Configure environment

3. Download plugin models

4. Run locally

Tuning Universal-3 Pro Streaming

Enabling keyterm prompting

Enabling real-time speaker diarization

Swapping components

Resources

Top APIs and models for real-time speech recognition and transcription in 2026

Prompting Claude to build voice agents

Build a voice agent without Pipecat or LiveKit

Universal-3.5 Pro Realtime vs. Voice Agent API: Which one should you actually build on?

AssemblyAI Voice Agent API vs OpenAI Realtime API: Which should you use?

Review - VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Convert Speech to Text in Python in 5 Minutes

Real-time transcription that code-switches for multilingual speakers

LiveKit voice agent with AssemblyAI Universal-3 Pro Streaming

Why Universal-3 Pro Streaming?

Architecture

Prerequisites

Quick start

1. Clone and install

2. Configure environment

3. Download plugin models

4. Run locally

Tuning Universal-3 Pro Streaming

Enabling keyterm prompting

Enabling real-time speaker diarization

Swapping components

Resources

Related posts

Top APIs and models for real-time speech recognition and transcription in 2026

Prompting Claude to build voice agents

Build a voice agent without Pipecat or LiveKit

Universal-3.5 Pro Realtime vs. Voice Agent API: Which one should you actually build on?

AssemblyAI Voice Agent API vs OpenAI Realtime API: Which should you use?

Review - VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Convert Speech to Text in Python in 5 Minutes

Real-time transcription that code-switches for multilingual speakers