customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Replace legacy IVR with AI voice agents powered by the fastest, most accurate speech-to-text. Build end-to-end with our Voice Agent API, or drop Universal-3 Pro Streaming into your existing stack.
Touch-tone menus and brittle keyword bots strand callers in loops that end in a hang-up or an escalation. Modern voice agents — built on accurate streaming STT, a managed LLM, and natural TTS — resolve more calls before a human ever picks up.
P50 median streaming latency for Universal-3 Pro Streaming.
Better alphanumeric accuracy than other providers.
SLA with SOC 2 Type 2 certification.
Audio processed daily in production.
Two ways to build
Ship a working support agent in an afternoon, or drop industry-leading STT into the orchestrator you already run.
Our proprietary voice stack via one WebSocket. Connect, stream audio in, get audio back — we handle the rest.
Best for
Free tier available · No credit card required
The STT layer for your cascading voice agent architecture. Works natively with your preferred orchestrator.
Best for
No concurrency caps · Autoscaling included
Ingest caller audio
Voice Agent API: single WebSocket. Or Twilio Media Streams → U3 Pro Streaming for BYO stack.
Real-time transcription
Punctuation-based turn detection at ~150ms P50. Keyterm boosting for your product vocabulary.
LLM reasoning
Intent classification, KB lookup, and response generation. Managed (Voice Agent API) or BYO.
Voice response
TTS audio streamed back to caller. Full round-trip under 1 second.
Voice Agent API — recommended
# Voice Agent API: one WebSocket, full pipeline
import asyncio, json, websockets
API_KEY = "YOUR_API_KEY"
async def run_agent():
async with websockets.connect(
"wss://agents.assemblyai.com/v1/ws",
extra_headers={"Authorization": f"Bearer {API_KEY}"},
) as ws:
await ws.send(json.dumps({
"type": "session.update",
"session": {
"system_prompt": "You are a helpful support agent for Acme Corp.",
"greeting": "Hi, this is Acme support — how can I help?",
"output": {"voice": "ivy"},
},
}))
# Stream audio in, get audio + transcript back
async for msg in ws:
handle(json.loads(msg)) # transcript.user, audio.delta, tool.call, ...
Universal-3 Pro Streaming + LiveKit — BYO stack
# LiveKit + AssemblyAI STT in a cascading pipeline
from livekit.agents import Agent, AgentSession
from livekit.plugins import assemblyai, cartesia, openai, silero
class SupportAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a support agent for Acme Corp. Be concise.",
)
async def entrypoint(ctx):
session = AgentSession(
stt=assemblyai.STT(
model="universal-streaming-english",
keyterms_prompt=["Acme Pro", "tier-2", "premium plan"],
),
llm=openai.LLM(model="gpt-4o"),
tts=cartesia.TTS(),
vad=silero.VAD.load(),
)
await session.start(room=ctx.room, agent=SupportAgent())
Universal-3 Pro Streaming transcribes 94%+ on noisy contact-center audio — the difference between a deflected ticket and an angry escalation.
Names, card numbers, addresses, and account IDs masked before transcripts hit your CRM, data warehouse, or QA stack.
Topic detection, sentiment, and call outcomes available on the live stream — coach agents in the moment, not the next day.
Edgetier's customer CarTrawler reduced chat handling time by 25% through enhanced insights and agent optimization.
Edgetier
The accuracy was strong, but the great documentation and unique models like Auto Chapters and Sentiment Analysis is what really won us over.
Nathan Webb, Product Manager — Aloware
Read more