April 2, 2026

Pipecat voice agent with AssemblyAI Universal-3 Pro Streaming

Build a real-time voice agent using Pipecat — Daily.co's open-source Voice AI framework — and the AssemblyAI Universal-3 Pro Streaming model as the speech-to-text engine.

Kelsey Foster

Growth

AI voice agents

Universal-3 Pro Streaming

Reviewed by

Table of contents

[Visible on live site]

Build a real-time voice agent using Pipecat — Daily.co's open-source Voice AI framework — and the AssemblyAI Universal-3 Pro Streaming model as the speech-to-text engine.

Pipecat's modular pipeline design means you can swap any component without touching the rest. AssemblyAI has a first-party Pipecat plugin with full Universal-3 Pro Streaming support — no manual WebSocket wiring required.

Why AssemblyAI in Pipecat?

Metric	AssemblyAI Universal-3 Pro	Deepgram Nova-3
P50 latency	307 ms	516 ms
P99 latency	1,012 ms	1,907 ms
Word Error Rate	8.14%	9.87%
Neural turn detection	✅	❌ (VAD only)
Mid-session prompting	✅	❌
Anti-hallucination	✅	❌
Real-time diarization	✅	❌

The 41% latency advantage is noticeable in live conversation — and the neural turn detection means fewer awkward double-responses when users pause mid-thought.

Prerequisites

Python 3.11+
AssemblyAI API key
Daily.co API key
OpenAI API key
Cartesia API key

Quick start

git clone https://github.com/kelsey-aai/voice-agent-pipecat-universal-3-pro
cd voice-agent-pipecat-universal-3-pro

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env with your API keys

# Create a Daily.co room
python create_room.py

# Start the bot (paste the room URL from above)
python bot.py --url https://your-name.daily.co/your-room

Open the room URL in your browser and start talking.

Universal-3 Pro Streaming features

Keyterm prompting

Boost accuracy on domain-specific vocabulary without restarting the session:

stt = AssemblyAISTTService(
    connection_params=AssemblyAIConnectionParams(
        api_key=os.environ["ASSEMBLYAI_API_KEY"],
        speech_model="u3-rt-pro",
        keyterms_prompt=["AssemblyAI", "Universal-3", "Pipecat", "YourBrandName"],
    )
)

Up to 1,000 terms per session. Essential for medical, legal, and financial applications.

Real-time speaker diarization

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    speaker_labels=True,
    max_speakers=2,
)

Multilingual support`‍`

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    language_detection=True,
)

Supported languages: English, Spanish, French, German, Italian, Portuguese.

Tuning turn detection

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    end_of_turn_confidence_threshold=0.7,
    min_end_of_turn_silence_when_confident=300,
    max_turn_silence=1000,
)

Deploy to PipecatCloud

pip install pipecatcloud
pcc auth login
pcc init
pcc secrets set my-agent-secrets --file .env
pcc deploy

Resources

Add AssemblyAI to your Pipecat pipeline

Start building

Pipecat voice agent with AssemblyAI Universal-3 Pro Streaming

Why AssemblyAI in Pipecat?

Prerequisites

Quick start

Universal-3 Pro Streaming features

Keyterm prompting

Real-time speaker diarization

Multilingual support`‍`

Tuning turn detection

Deploy to PipecatCloud

Resources

Top APIs and models for real-time speech recognition and transcription in 2026

Prompting Claude to build voice agents

Build a voice agent without Pipecat or LiveKit

Universal-3.5 Pro Realtime vs. Voice Agent API: Which one should you actually build on?

Business use cases for Generative AI

Review - SimCLS and RefSum - Summarization Techniques

Activation Functions In Neural Networks Explained

Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

Pipecat voice agent with AssemblyAI Universal-3 Pro Streaming

Why AssemblyAI in Pipecat?

Prerequisites

Quick start

Universal-3 Pro Streaming features

Keyterm prompting

Real-time speaker diarization

Multilingual support‍

Tuning turn detection

Deploy to PipecatCloud

Resources

Related posts

Top APIs and models for real-time speech recognition and transcription in 2026

Prompting Claude to build voice agents

Build a voice agent without Pipecat or LiveKit

Universal-3.5 Pro Realtime vs. Voice Agent API: Which one should you actually build on?

Business use cases for Generative AI

Review - SimCLS and RefSum - Summarization Techniques

Activation Functions In Neural Networks Explained

Automatic speech-to-text punctuation, casing, and ITN to boost transcript readability

Multilingual support`‍`