customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Build ambient AI scribes that listen to patient-provider conversations and automatically generate structured clinical notes. Powered by Medical Mode with 87% fewer medical entity errors, speaker diarization, and LLM Gateway for SOAP note generation.
SOAP note — auto-generated
Visit: Annual wellness · Dr. Patel · 14 min
Subjective
Patient reports persistent fatigue over 3 weeks. Denies chest pain, SOB. Sleep quality poor…
Objective
BP 128/82, HR 74, Temp 98.6°F. BMI 27.3. No lymphadenopathy…
Assessment & plan
R53.83 Fatigue. Order CBC, CMP, TSH, ferritin. F/U 2 weeks…
Providers spend two hours on documentation for every one hour of patient care. That overhead drives burnout, shrinks appointment availability, and costs health systems thousands per provider annually in lost revenue. Ambient AI scribes — built on clinical-grade speech-to-text, speaker diarization, and LLM-powered note generation — eliminate the typing so providers can focus on the patient in front of them.
Fewer medical entity errors with Medical Mode.
Far-field capture as providers move around the room.
P50 median streaming latency on Universal-3 Pro.
Models for note generation — Claude, GPT, Gemini, and more through one API.
Two ways to build
Ship an ambient scribe with our managed pipeline, or drop medical-grade STT into the orchestrator you already run.
Our proprietary voice stack with Medical Mode via one WebSocket. Real-time ambient transcription with built-in speaker diarization, LLM reasoning, and TTS for interactive scribes.
Best for
Free tier available · No credit card required
The medical-grade STT layer for your ambient scribe pipeline. Pair with your own LLM for SOAP generation and your own EHR integration logic.
Best for
No concurrency caps · Autoscaling included
Capture clinical audio
Voice Agent API: single WebSocket. Or smartphone, tablet, or room mic → U3 Pro Streaming for BYO stack. Far-field from 20+ feet.
Transcribe with Medical Mode
87% fewer medical entity errors. Speaker diarization labels provider and patient speech automatically at ~150ms P50.
Generate structured notes
LLM Gateway organizes the diarized transcript into SOAP, DAP, or specialty-specific templates. 25+ models across Claude, GPT, and Gemini.
Review and sync to EHR
Provider reviews draft note, edits as needed, approves. Push to Epic, Cerner, or any EHR via API integration.
Encounter timeline
Provider
"Let's review your metformin dosage — any side effects with the 500mg?"
Patient
"Some nausea in the morning, but it's getting better."
Provider
"Good. We'll keep the current dose and recheck A1C in 3 months."
Voice Agent API — recommended
# Voice Agent API: ambient scribe with Medical Mode
import asyncio, json, websockets
API_KEY = "YOUR_API_KEY"
async def run_scribe():
async with websockets.connect(
"wss://agents.assemblyai.com/v1/ws",
additional_headers={"Authorization": f"Bearer {API_KEY}"},
) as ws:
await ws.send(json.dumps({
"type": "session.update",
"session": {
"system_prompt": (
"You are an ambient medical scribe. Listen to the "
"encounter and generate a SOAP note when the visit ends."
),
"input": {"keyterms": ["metformin", "lisinopril", "A1C", "Dr. Patel"]},
"output": {"voice": "ivy"},
},
}))
# Stream encounter audio in, get transcript + note back
async for msg in ws:
handle(json.loads(msg)) # transcript.user, reply.audio, tool.call, ...
Universal-3 Pro Streaming + LiveKit — BYO stack
# LiveKit + AssemblyAI Medical Mode in a cascading scribe pipeline
from livekit.agents import Agent, AgentSession
from livekit.plugins import assemblyai, cartesia, openai, silero
class MedicalScribe(Agent):
def __init__(self):
super().__init__(
instructions=(
"You are an ambient scribe for Dr. Patel's clinic. "
"Generate SOAP notes from the encounter transcript."
),
)
async def entrypoint(ctx):
session = AgentSession(
stt=assemblyai.STT(
model="u3-rt-pro",
domain="medical-v1", # Enable Medical Mode
keyterms_prompt=["metformin", "lisinopril", "A1C", "Dr. Patel"],
min_turn_silence=800, # Clinicians pause to think
max_turn_silence=2000, # Don't fragment chart-review pauses
),
llm=openai.LLM(model="gpt-4o"),
tts=cartesia.TTS(),
vad=silero.VAD.load(),
)
await session.start(room=ctx.room, agent=MedicalScribe())
87% fewer medical entity errors — correctly captures drug names, dosages, anatomical terms, and ICD codes from ambient exam room audio.
Real-time speaker diarization separates provider and patient speech automatically — essential for mapping conversation segments to SOAP note sections.
Access 25+ models through one unified API — Claude, GPT, Gemini, and more — for SOAP note generation. Customizable templates for any specialty: primary care, psych, surgery, radiology.
JotPsych uses AssemblyAI to power ambient clinical documentation for mental health providers — capturing nuanced patient conversations accurately and securely.
Read more We require a leading edge speech-to-text provider that can meet our specialized needs: fast, accurate, targeted, and multilingual.
Read more