Insights & Use Cases
April 2, 2026

Twilio phone agent with AssemblyAI Universal-3 Pro Streaming

Build an AI phone agent that handles real calls using Twilio Voice + Media Streams and the AssemblyAI Universal-3 Pro Streaming model for real-time speech-to-text.

Reviewed by
No items found.
Table of contents

Build an AI phone agent that handles real calls using Twilio Voice + Media Streams and the AssemblyAI Universal-3 Pro Streaming model for real-time speech-to-text.

The key detail: Twilio streams 8kHz μ-law (mulaw) audio. AssemblyAI Universal-3 Pro accepts pcm_mulaw at sample_rate=8000 natively — no resampling, no format conversion.

Architecture

Incoming call
  Twilio Voice
     │ TwiML → open WebSocket
Your server (/media-stream WebSocket)
     │                        │
     │ mulaw 8kHz audio       │ synthesized mulaw audio
     ▼                        ▲
AssemblyAI Universal-3 Pro    ElevenLabs TTS
     │ transcript + turn signal
  OpenAI GPT-4o

Prerequisites

  • Python 3.11+
  • AssemblyAI API key
  • Twilio account with a phone number
  • OpenAI API key
  • ElevenLabs API key
  • ngrok (for local development)

Quick start

git clone https://github.com/kelseyefoster/voice-agent-twilio-universal-3-pro
cd voice-agent-twilio-universal-3-pro

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env with your API keys

uvicorn server:app --host 0.0.0.0 --port 8000
ngrok http 8000

Configure Twilio

  1. Go to Twilio Console > Phone Numbers
  2. Select your number > Voice & Fax
  3. Set A Call Comes In to Webhook: https://your-ngrok-url.ngrok.io/incoming-call
  4. Call your Twilio number

AssemblyAI WebSocket parameters for Twilio

ASSEMBLYAI_WS_URL = (
    "wss://streaming.assemblyai.com/v3/ws"
    "?speech_model=u3-rt-pro"
    "&encoding=pcm_mulaw"      # must match Twilio's audio format
    "&sample_rate=8000"        # must match Twilio's 8kHz stream
    "&end_of_turn_confidence_threshold=0.5"
    "&min_turn_silence=400"
)

Phone calls have more background noise than browser audio — the slightly higher confidence threshold and longer min_turn_silence reduce false triggers.

Extending the agent

Add post-call transcription

import assemblyai as aai
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(recording_url)
print(transcript.text)

Add keyterm prompting

ASSEMBLYAI_WS_URL +=
"&keyterms_prompt=YourBrand&keyterms_prompt=SpecialTerm"

Deploy to Railway or Render

# Railway
railway login && railway init && railway up

# Render — create a Web Service pointing to this repo
# Start: uvicorn server:app --host 0.0.0.0 --port $PORT

Resources

Build your Twilio phone agent today

Sign up for a free AssemblyAI account and start transcribing Twilio calls with Universal-3 Pro Streaming in under 30 minutes.

Start building
Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
AI voice agents
Universal-3 Pro Streaming