Solutions

Voice agents for healthcare

Automate scheduling, insurance verification, and patient intake with AI voice agents powered by the fastest, most accurate medical speech-to-text. Build end-to-end with our Voice Agent API, or drop Universal-3 Pro Streaming with Medical Mode into your existing stack.

Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
The problem

Phone trees and paper intake are losing patients

Patients abandon long IVR menus and fill paper intake forms that should take seconds. Generic speech-to-text mangles medication names, dosages, and ICD references — and PHI handling is an afterthought instead of a primitive. Modern healthcare voice agents — built on medical-grade streaming STT with in-stream PHI redaction, a managed LLM, and natural TTS — handle patient intake, telehealth triage, appointment scheduling, and prescription refills before a human staff member ever picks up. They replace legacy IVR, free clinical staff from low-acuity calls, and capture structured data into the EHR in the same conversation.

Why accuracy matters in healthcare voice

Medical entities 87%

Fewer medical entity errors with Medical Mode for pharma, anatomy, and ICD references.

Latency ~150ms

P50 median streaming latency — natural turn-taking with patients.

Compliance BAA

Business Associate Addendum available for covered entities processing PHI. SOC 2 Type 2 certified.

Scale 40TB+

Audio processed daily in production across healthcare and contact-center workloads.

Two ways to build

Pick the API that fits your healthcare stack

Ship a working intake or triage agent in an afternoon, or drop medical-grade STT into the orchestrator you already run.

Recommended

Voice Agent API

Our proprietary voice stack via one WebSocket. Connect, stream patient audio in, get audio back — we handle the rest.

Best for

  • Best-in-class healthcare voice agents — the preferred way to build with AssemblyAI
  • Patient intake, telehealth triage, prescription refills, appointment scheduling, post-discharge follow-up
  • Teams shipping fast — working agent in an afternoon, no infra to manage
  • Business Associate Addendum (BAA) available, in-stream PHI redaction
$4.50/hr — speech, LLM, and voice all included
Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The STT layer for your cascading voice agent architecture. Works natively with LiveKit, Pipecat, and Twilio.

Best for

  • Teams already using LiveKit, Pipecat, or Vapi as their orchestration layer
  • Cascading architectures (STT → LLM → TTS) with EHR tool calls
  • High-scale deployments where margin and full control matter
  • Workflows with RAG over clinical notes or proprietary LLMs
  • BAA-eligible, SOC 2 Type 2 — bring your own compliance infrastructure
$0.45/hr — transcription only, unlimited concurrent streams
View integration docs

No concurrency caps · Autoscaling included

Your healthcare voice pipeline

Ingest patient audio

Voice Agent API: single WebSocket. Or Twilio Media Streams / SIP → U3 Pro Streaming for BYO stack.

Clinical-grade transcription

Universal-3 Pro Streaming with Medical Mode + keyterm prompting for medications, anatomy, and ICD codes.

PHI redaction in-stream

Names, DOBs, MRNs, addresses, and account IDs masked before transcripts leave the pipeline.

LLM reasoning + EHR tool calls

Triage rules, scheduling, refills, intake — managed (Voice Agent API) or BYO with LiveKit / Pipecat.

Voice response

TTS streamed back to patient. Full round-trip under 1 second with natural turn detection.

Quickstart

Get a working healthcare voice agent in minutes

Voice Agent API — recommended

# Voice Agent API — single WebSocket, full healthcare pipeline
import asyncio, json, websockets

API_KEY = "YOUR_API_KEY"

async def run_agent():
    async with websockets.connect(
        "wss://agents.assemblyai.com/v1/ws",
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": (
                    "You are a triage agent for Acme Health. "
                    "Collect symptoms, medications, and insurance."
                ),
                "greeting": "Hi, this is Acme Health — what brings you in today?",
                "output": {"voice": "ivy"},
            },
        }))
        async for msg in ws:
            handle(json.loads(msg))  # session.ready, transcript.user, reply.audio, tool.call, ...

Universal-3 Pro Streaming + LiveKit — BYO stack

# LiveKit + AssemblyAI Universal-3 Pro Streaming — healthcare voice agent
from livekit.agents import Agent, AgentSession, TurnHandlingOptions
from livekit.plugins import assemblyai, cartesia, openai, silero

class IntakeAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions=(
                "You are a patient intake agent for Acme Health. "
                "Collect symptoms, current medications, allergies, and insurance."
            ),
        )

async def entrypoint(ctx):
    session = AgentSession(
        stt=assemblyai.STT(
            model="u3-rt-pro",
            min_turn_silence=100,
            max_turn_silence=1000,   # patients pause longer; bump higher for entity dictation
            vad_threshold=0.3,
            keyterms_prompt=["amoxicillin", "metformin", "ICD-10", "PCP referral"],
        ),
        llm=openai.LLM(model="gpt-4o"),
        tts=cartesia.TTS(),
        vad=silero.VAD.load(activation_threshold=0.3),  # match AssemblyAI's vad_threshold
        turn_handling=TurnHandlingOptions(
            turn_detection="stt",
            endpointing={"min_delay": 0},  # AssemblyAI handles endpointing; avoid additive delay
        ),
    )
    await session.start(room=ctx.room, agent=IntakeAgent())

Medical-grade accuracy

Universal-3 Pro Streaming with Medical Mode reduces medical entity errors by 87% — pharmaceutical names, anatomical terms, dosages, and ICD references come through clean on phone and far-field audio.

PHI redaction + BAA

PHI redaction runs in-stream and is configurable per session. BAA available on request, with SOC 2 Type 2, AES-256 encryption, and optional EU data residency.

Speaker diarization for multi-party visits

Accurately separate patient, provider, and staff turns even when speakers move in and out of frame — critical for telehealth visits, rounds, and ambient documentation.

Frequently asked questions