customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Automate scheduling, insurance verification, and patient intake with AI voice agents powered by the fastest, most accurate medical speech-to-text. Build end-to-end with our Voice Agent API, or drop Universal-3 Pro Streaming with Medical Mode into your existing stack.
Patients abandon long IVR menus and fill paper intake forms that should take seconds. Generic speech-to-text mangles medication names, dosages, and ICD references — and PHI handling is an afterthought instead of a primitive. Modern healthcare voice agents — built on medical-grade streaming STT with in-stream PHI redaction, a managed LLM, and natural TTS — handle patient intake, telehealth triage, appointment scheduling, and prescription refills before a human staff member ever picks up. They replace legacy IVR, free clinical staff from low-acuity calls, and capture structured data into the EHR in the same conversation.
Fewer medical entity errors with Medical Mode for pharma, anatomy, and ICD references.
P50 median streaming latency — natural turn-taking with patients.
Business Associate Addendum available for covered entities processing PHI. SOC 2 Type 2 certified.
Audio processed daily in production across healthcare and contact-center workloads.
Two ways to build
Ship a working intake or triage agent in an afternoon, or drop medical-grade STT into the orchestrator you already run.
Our proprietary voice stack via one WebSocket. Connect, stream patient audio in, get audio back — we handle the rest.
Best for
Free tier available · No credit card required
The STT layer for your cascading voice agent architecture. Works natively with LiveKit, Pipecat, and Twilio.
Best for
No concurrency caps · Autoscaling included
Ingest patient audio
Voice Agent API: single WebSocket. Or Twilio Media Streams / SIP → U3 Pro Streaming for BYO stack.
Clinical-grade transcription
Universal-3 Pro Streaming with Medical Mode + keyterm prompting for medications, anatomy, and ICD codes.
PHI redaction in-stream
Names, DOBs, MRNs, addresses, and account IDs masked before transcripts leave the pipeline.
LLM reasoning + EHR tool calls
Triage rules, scheduling, refills, intake — managed (Voice Agent API) or BYO with LiveKit / Pipecat.
Voice response
TTS streamed back to patient. Full round-trip under 1 second with natural turn detection.
Voice Agent API — recommended
# Voice Agent API — single WebSocket, full healthcare pipeline
import asyncio, json, websockets
API_KEY = "YOUR_API_KEY"
async def run_agent():
async with websockets.connect(
"wss://agents.assemblyai.com/v1/ws",
additional_headers={"Authorization": f"Bearer {API_KEY}"},
) as ws:
await ws.send(json.dumps({
"type": "session.update",
"session": {
"system_prompt": (
"You are a triage agent for Acme Health. "
"Collect symptoms, medications, and insurance."
),
"greeting": "Hi, this is Acme Health — what brings you in today?",
"output": {"voice": "ivy"},
},
}))
async for msg in ws:
handle(json.loads(msg)) # session.ready, transcript.user, reply.audio, tool.call, ...
Universal-3 Pro Streaming + LiveKit — BYO stack
# LiveKit + AssemblyAI Universal-3 Pro Streaming — healthcare voice agent
from livekit.agents import Agent, AgentSession, TurnHandlingOptions
from livekit.plugins import assemblyai, cartesia, openai, silero
class IntakeAgent(Agent):
def __init__(self):
super().__init__(
instructions=(
"You are a patient intake agent for Acme Health. "
"Collect symptoms, current medications, allergies, and insurance."
),
)
async def entrypoint(ctx):
session = AgentSession(
stt=assemblyai.STT(
model="u3-rt-pro",
min_turn_silence=100,
max_turn_silence=1000, # patients pause longer; bump higher for entity dictation
vad_threshold=0.3,
keyterms_prompt=["amoxicillin", "metformin", "ICD-10", "PCP referral"],
),
llm=openai.LLM(model="gpt-4o"),
tts=cartesia.TTS(),
vad=silero.VAD.load(activation_threshold=0.3), # match AssemblyAI's vad_threshold
turn_handling=TurnHandlingOptions(
turn_detection="stt",
endpointing={"min_delay": 0}, # AssemblyAI handles endpointing; avoid additive delay
),
)
await session.start(room=ctx.room, agent=IntakeAgent())
Universal-3 Pro Streaming with Medical Mode reduces medical entity errors by 87% — pharmaceutical names, anatomical terms, dosages, and ICD references come through clean on phone and far-field audio.
PHI redaction runs in-stream and is configurable per session. BAA available on request, with SOC 2 Type 2, AES-256 encryption, and optional EU data residency.
Accurately separate patient, provider, and staff turns even when speakers move in and out of frame — critical for telehealth visits, rounds, and ambient documentation.
JotPsych uses AssemblyAI to power ambient clinical documentation for mental health providers — capturing nuanced patient conversations accurately and securely.
Read more We require a leading edge speech-to-text provider that can meet our specialized needs: fast, accurate, targeted, and multilingual.
Read more