Solutions

Voice agents for real-time agent assist

Give your human agents live transcription, real-time coaching prompts, and automatic escalation detection — all powered by the most accurate Real-time Speech-to-Text API in the market.

Agent assist panel

Live

Customer

"I've been waiting two weeks for my refund and nobody can tell me where it is."

Suggested response

Acknowledge frustration. Pull up order #, check refund status in billing system.

Agent

"I completely understand. Let me pull up your account right now..."

Escalation detected

Keywords: "cancel", "supervisor" — compliance script recommended.

Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
The problem

Agents fly blind on every call until it's too late

Contact center agents handle dozens of calls per shift with no real-time visibility. Supervisors spot-check a fraction; compliance violations and escalation opportunities are caught after the fact, if at all. Traditional QA is post-hoc — by the time you flag a churn signal or a missed cross-sell, the customer is gone. AssemblyAI gives every agent a live coach: real-time transcript, AI-generated suggestions, and automatic keyword detection on 100% of calls.

Built for real-time contact center operations

Latency ~150ms

Median streaming latency on Universal-3 Pro Streaming for real-time agent display.

Entity accuracy 28%

Better consecutive number recognition for order numbers, account IDs, and SKUs.

Coverage 100%

Of calls analyzed in real time — no more random spot-checks.

Keyterms 100

Domain-specific keyterms boosted dynamically per session.

Two ways to build

Pick the API that fits your agent assist stack

Ship a working coaching pipeline in an afternoon, or drop best-in-class streaming STT into the agent desktop you already run.

Recommended

Voice Agent API

Our proprietary voice stack via one WebSocket. Run an AI tier-1 voice agent that triages and warm-transfers complex calls to your human team with full context — zero infra to manage.

Best for

  • Tier-1 triage with warm escalation to human agents
  • Tool calls for CRM lookup, ticket creation, and handoff
  • Built-in keyterm prompting for compliance triggers
  • Claude Code compatible — paste the docs and build anything
$4.50/hr — speech, LLM, and voice all included
Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The live transcript layer for your agent desktop. Works natively with LiveKit, Pipecat, Vapi, and Twilio — speaker labels, dynamic keyterms, and PII redaction baked in.

Best for

  • Teams running their own LLM and agent desktop
  • ~150ms P50 median latency for instant suggestions
  • Native speaker labels for agent/customer separation
  • Dynamic keyterms — update mid-call as topics shift
  • Inline PII redaction before data hits the agent screen
$0.45/hr — transcription only, unlimited streams
View integration docs

No concurrency caps · Autoscaling included

One pipeline turns every call into a coaching opportunity

Live transcription with speaker labels

Universal-3 Pro Streaming transcribes agent and customer turns in real time with built-in diarization, so coaching prompts are always anchored to the right speaker.

AI-generated suggestions during the call

Pipe finalized turns into the LLM Gateway (25+ models) for real-time response suggestions, objection handling, and next-best-action prompts.

Escalation and compliance keyword detection

Boost compliance triggers and escalation phrases through keyterm prompting. Update keyterms mid-call as the conversation evolves.

PII-safe agent surface

Stream PII redaction inline so the agent assist UI never displays raw card numbers, SSNs, or other sensitive data.

support_agent

Agent assist pipeline

Capture live call audio

Transcribe + diarize in real time

Generate coaching suggestions

Flag escalations + compliance

Quickstart

Build a real-time agent assist tool in minutes

Voice Agent API — tier-1 caller agent with human escalation

# Voice Agent API: tier-1 caller agent that escalates to a human
import asyncio, json, websockets

API_KEY = "YOUR_API_KEY"

async def run_agent():
    async with websockets.connect(
        "wss://agents.assemblyai.com/v1/ws",
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": (
                    "You are a tier-1 support agent. Triage the caller's issue. "
                    "If they ask for a supervisor, say 'cancel', or the issue is "
                    "complex, call escalate_to_human with a one-sentence summary."
                ),
                "greeting": "Hi, thanks for calling Acme — how can I help today?",
                "input": {"keyterms": ["cancel", "supervisor", "refund", "lawyer"]},
                "output": {"voice": "ivy"},
                "tools": [{
                    "type": "function",
                    "name": "escalate_to_human",
                    "description": "Warm-transfer to a human agent with full context.",
                    "parameters": {
                        "type": "object",
                        "properties": {"summary": {"type": "string"}},
                        "required": ["summary"],
                    },
                }],
            },
        }))
        async for msg in ws:
            handle(json.loads(msg))  # transcript.user, reply.audio, tool.call, ...

Universal-3 Pro Streaming — live transcript for agent desktop

# Universal-3 Pro Streaming: live transcript for an agent assist desktop
import asyncio, json, websockets
from urllib.parse import urlencode

API_KEY = "YOUR_API_KEY"

params = urlencode({
    "sample_rate": 16000,
    "speech_model": "u3-rt-pro",
    "keyterms_prompt": json.dumps([
        "cancel", "supervisor", "refund",
        "order #", "account number",
    ]),
    "format_turns": "true",
    "speaker_labels": "true",                  # diarize agent vs. caller
    "redact_pii": "true",                      # mask PII before it hits the UI
    "redact_pii_policies": json.dumps([
        "credit_card_number", "us_social_security_number",
        "date_of_birth", "phone_number", "email_address",
    ]),
    "redact_pii_sub": "entity_name",           # e.g. [CREDIT_CARD_NUMBER]
})

async def stream_agent_assist(audio_iter):
    url = f"wss://streaming.assemblyai.com/v3/ws?{params}"
    async with websockets.connect(
        url, additional_headers={"Authorization": API_KEY},
    ) as ws:
        async def send_audio():
            async for chunk in audio_iter:
                await ws.send(chunk)
        asyncio.create_task(send_audio())
        async for raw in ws:
            evt = json.loads(raw)
            if evt.get("type") == "Turn" and evt.get("end_of_turn"):
                # finalized, PII-redacted turn — pipe to LLM Gateway for coaching
                push_to_coach(evt["transcript"], evt.get("speaker_label"))

Sub-300ms streaming for instant suggestions

Universal-3 Pro Streaming delivers ~150ms P50 median latency, so coaching prompts appear before the agent has to think — not after the moment has passed.

Dynamic keyterms for evolving calls

Push a new keyterm list with UpdateConfiguration mid-call when the conversation shifts from menu items to payment terms — context biasing adapts in real time.

Inline PII redaction on every turn

Redact card numbers, SSNs, dates of birth, and other sensitive data before they ever hit the agent's screen. SOC 2 Type 2 certified; BAA available for regulated workloads.

Frequently asked questions