customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Give your human agents live transcription, real-time coaching prompts, and automatic escalation detection — all powered by the most accurate Real-time Speech-to-Text API in the market.
Agent assist panel
LiveCustomer
"I've been waiting two weeks for my refund and nobody can tell me where it is."
Suggested response
Acknowledge frustration. Pull up order #, check refund status in billing system.
Agent
"I completely understand. Let me pull up your account right now..."
Escalation detected
Keywords: "cancel", "supervisor" — compliance script recommended.
Contact center agents handle dozens of calls per shift with no real-time visibility. Supervisors spot-check a fraction; compliance violations and escalation opportunities are caught after the fact, if at all. Traditional QA is post-hoc — by the time you flag a churn signal or a missed cross-sell, the customer is gone. AssemblyAI gives every agent a live coach: real-time transcript, AI-generated suggestions, and automatic keyword detection on 100% of calls.
Median streaming latency on Universal-3 Pro Streaming for real-time agent display.
Better consecutive number recognition for order numbers, account IDs, and SKUs.
Of calls analyzed in real time — no more random spot-checks.
Domain-specific keyterms boosted dynamically per session.
Two ways to build
Ship a working coaching pipeline in an afternoon, or drop best-in-class streaming STT into the agent desktop you already run.
Our proprietary voice stack via one WebSocket. Run an AI tier-1 voice agent that triages and warm-transfers complex calls to your human team with full context — zero infra to manage.
Best for
Free tier available · No credit card required
The live transcript layer for your agent desktop. Works natively with LiveKit, Pipecat, Vapi, and Twilio — speaker labels, dynamic keyterms, and PII redaction baked in.
Best for
No concurrency caps · Autoscaling included
Live transcription with speaker labels
Universal-3 Pro Streaming transcribes agent and customer turns in real time with built-in diarization, so coaching prompts are always anchored to the right speaker.
AI-generated suggestions during the call
Pipe finalized turns into the LLM Gateway (25+ models) for real-time response suggestions, objection handling, and next-best-action prompts.
Escalation and compliance keyword detection
Boost compliance triggers and escalation phrases through keyterm prompting. Update keyterms mid-call as the conversation evolves.
PII-safe agent surface
Stream PII redaction inline so the agent assist UI never displays raw card numbers, SSNs, or other sensitive data.
Agent assist pipeline
Capture live call audio
Transcribe + diarize in real time
Generate coaching suggestions
Flag escalations + compliance
Voice Agent API — tier-1 caller agent with human escalation
# Voice Agent API: tier-1 caller agent that escalates to a human
import asyncio, json, websockets
API_KEY = "YOUR_API_KEY"
async def run_agent():
async with websockets.connect(
"wss://agents.assemblyai.com/v1/ws",
additional_headers={"Authorization": f"Bearer {API_KEY}"},
) as ws:
await ws.send(json.dumps({
"type": "session.update",
"session": {
"system_prompt": (
"You are a tier-1 support agent. Triage the caller's issue. "
"If they ask for a supervisor, say 'cancel', or the issue is "
"complex, call escalate_to_human with a one-sentence summary."
),
"greeting": "Hi, thanks for calling Acme — how can I help today?",
"input": {"keyterms": ["cancel", "supervisor", "refund", "lawyer"]},
"output": {"voice": "ivy"},
"tools": [{
"type": "function",
"name": "escalate_to_human",
"description": "Warm-transfer to a human agent with full context.",
"parameters": {
"type": "object",
"properties": {"summary": {"type": "string"}},
"required": ["summary"],
},
}],
},
}))
async for msg in ws:
handle(json.loads(msg)) # transcript.user, reply.audio, tool.call, ...
Universal-3 Pro Streaming — live transcript for agent desktop
# Universal-3 Pro Streaming: live transcript for an agent assist desktop
import asyncio, json, websockets
from urllib.parse import urlencode
API_KEY = "YOUR_API_KEY"
params = urlencode({
"sample_rate": 16000,
"speech_model": "u3-rt-pro",
"keyterms_prompt": json.dumps([
"cancel", "supervisor", "refund",
"order #", "account number",
]),
"format_turns": "true",
"speaker_labels": "true", # diarize agent vs. caller
"redact_pii": "true", # mask PII before it hits the UI
"redact_pii_policies": json.dumps([
"credit_card_number", "us_social_security_number",
"date_of_birth", "phone_number", "email_address",
]),
"redact_pii_sub": "entity_name", # e.g. [CREDIT_CARD_NUMBER]
})
async def stream_agent_assist(audio_iter):
url = f"wss://streaming.assemblyai.com/v3/ws?{params}"
async with websockets.connect(
url, additional_headers={"Authorization": API_KEY},
) as ws:
async def send_audio():
async for chunk in audio_iter:
await ws.send(chunk)
asyncio.create_task(send_audio())
async for raw in ws:
evt = json.loads(raw)
if evt.get("type") == "Turn" and evt.get("end_of_turn"):
# finalized, PII-redacted turn — pipe to LLM Gateway for coaching
push_to_coach(evt["transcript"], evt.get("speaker_label"))
Universal-3 Pro Streaming delivers ~150ms P50 median latency, so coaching prompts appear before the agent has to think — not after the moment has passed.
Push a new keyterm list with UpdateConfiguration mid-call when the conversation shifts from menu items to payment terms — context biasing adapts in real time.
Redact card numbers, SSNs, dates of birth, and other sensitive data before they ever hit the agent's screen. SOC 2 Type 2 certified; BAA available for regulated workloads.
Calls are and will remain a pertinent part of the customer service journey. Customer service isn't moving entirely to chatbots or chat interactions.
Dr. Shane Lynn, CEO — EdgeTier
Real-time transcription and speaker diarization power Jiminny's conversation intelligence platform — helping sales agents at the exact moment the conversation happens, with a 51% increase in customer satisfaction and a 15% higher win rate.
Jiminny