Insights & Use Cases
June 23, 2026

One parameter, 20% fewer missed entities: a before/after tour of Medical Mode

One config line — domain: "medical-v1" — gets you about 20% fewer missed medical entities on the audio you're already sending. No model swap, no re-integration. Here's the before/after.

Kelsey Foster
Growth
Reviewed by
No items found.
Table of contents

You already have a pipeline. You're already sending clinical audio to AssemblyAI. So this post isn't going to teach you what a transcript is or why drug names are hard. You know.

Here's the part that matters: there's a single config parameter that gets you about 20% fewer missed medical entities on the exact same audio you're already sending. No new model. No re-integration. No new API key. One line.

That parameter is domain: "medical-v1", and it turns on Medical Mode.

Let me show you what changes, what doesn't, and what it costs.

The one parameter

Here's your existing call to Universal-3 Pro, with Medical Mode added:

import assemblyai as aai

transcriber = aai.Transcriber()
config = aai.TranscriptionConfig(
speech_models=["universal-3-pro"],
domain="medical-v1",   # the one parameter
)
transcript = transcriber.transcribe("clinical-audio.wav", config)


That's it. That's the diff. If you were already on Universal-3 Pro, you add one field to the config object you already build. Everything downstream of the transcript—your diarization handling, your redact_pii step, your storage—stays exactly where it is.

Notice what you didn't do. You didn't swap to a different model family. Medical Mode runs on top of Universal-3 Pro and Universal-3.5 Pro Realtime, so the decoder, the language coverage, the entity accuracy you already rely on—all still there. You're tuning the model you're already running, not replacing it.

What actually changes in the transcript

Universal-3 Pro is already strong on entities—drug names, proper nouns, rare words—because it uses an LLM-based decoder rather than a classic acoustic-only approach. Medical Mode pushes that further specifically for clinical vocabulary. The result: a 3.2% Missed Entity Rate (MER), roughly 20% fewer missed medical entities than Universal-3 Pro alone, and the lowest MER across the providers we benchmarked against—Deepgram, Speechmatics, AWS, and Google. The full numbers live on /benchmarks.

But abstract percentages don't tell you what to expect in your own output. So here's a concrete picture.
Illustrative example. The transcript pair below is constructed to show the kind of errors Medical Mode catches—it is not measured data. The measured numbers come from /benchmarks.

A clinician dictates a short medication summary. Without Medical Mode, a general model might produce:

"Patient continues on metformin 500 milligrams twice daily. Started hydrochlorothiazide for the 
hypertension, and we'll reassess after the echocardiogram."

…but the errors a general model tends to make cluster exactly where it hurts:

"Patient continues on Metro Min 500 milligrams twice daily. Started hydrocortisone for the hypertension, and 
we'll reassess after the echo cardiogram."


Read that second version as a downstream system would. "Metro Min" isn't a drug. "Hydrocortisone" is a real drug—just the wrong one, and a dangerous swap for a thiazide diuretic. "Echo cardiogram" splits a procedure into two tokens your coding logic won't recognize.

With Medical Mode on, the same audio resolves to:

"Patient continues on metformin 500 milligrams twice daily. Started hydrochlorothiazide for the 
hypertension, and we'll reassess after the echocardiogram."

The dosage was never the hard part. The entities were. That's the whole point of measuring missed entities separately from raw word error—a model can nail the filler words and still hand you the wrong drug.

Want to see it on your own audio? Activate Medical Mode in your dashboard and re-run a file you've already transcribed. Open your dashboard →

What does not change

This is the part I want to be loud about, because "improve medical accuracy" usually implies a migration project. Here it doesn't.

No model swap. You stay on Universal-3 Pro or Universal-3.5 Pro Realtime. Medical Mode is a setting on those models, not a separate endpoint.
No re-integration. Same SDK, same Transcriber(), same response shape. Your parsing code doesn't move.
No new API key. The key you're using right now works. There's nothing to provision.
No language regression. Medical Mode supports English, Spanish, German, and French—both pre-recorded and streaming. If your traffic is multilingual, the clinical tuning follows the language.

If you've ever scoped a "switch transcription providers for better medical accuracy" ticket, you know it usually runs weeks. This is a config change you can ship in an afternoon and roll back just as fast.

Async and streaming, same flag

The example above is pre-recorded. Streaming works the same way—you set domain: "medical-v1" on Universal-3.5 Pro Realtime and keep your existing socket handling.

This matters for ambient clinical use. If you're transcribing a live encounter—an ambient scribe, a telehealth visit, a nurse triage line—you get the same entity tuning at streaming latency. Universal-3.5 Pro Realtime's end-of-turn detection reads tonality, pacing, and rhythm rather than silence alone and lands around 300ms, and turning on Medical Mode doesn't change how you consume partial and final transcripts. You're not trading speed for accuracy here; you're getting the clinical vocabulary handling inline.

One nuance worth knowing if you use keyterms prompting: streaming supports up to 100 keyterms for free, mid-stream. Medical Mode and keyterms aren't mutually exclusive—Medical Mode handles the broad clinical vocabulary, and you can still prompt the handful of facility-specific terms (a local formulary name, a clinic's procedure shorthand) that no general medical model would know.

What it costs

Medical Mode is a $0.15/hr add-on. It stacks on top of your base model price:

  • Universal-3 Pro + Medical Mode = $0.36/hr (the $0.21/hr async base plus the add-on)
  • Universal-3.5 Pro Realtime + Medical Mode = $0.60/hr (the $0.45/hr streaming base plus the add-on)

Full breakdown on the pricing page.

Here's how I'd think about that fifteen cents. The cost of a missed entity isn't the audio minute—it's the downstream correction. A transcript that turns hydrochlorothiazide into hydrocortisone doesn't just need a re-listen; in a clinical workflow it can trigger a manual review, a clinician callback, or worse. Cutting missed entities by ~20% changes how much human QA your pipeline needs. For most teams running clinical audio at volume, that math closes fast.

A note on PHI handling

Since you're processing clinical audio, the compliance question comes up. AssemblyAI enables covered entities and their business associates subject to HIPAA to use the AssemblyAI services to process protected health information (PHI). AssemblyAI is considered a business associate under HIPAA, and we offer a standard Business Associate Addendum (BAA). Medical Mode runs on that same BAA-eligible infrastructure, BAA included.

Practically, that means the tools you'd reach for—diarization to separate clinician from patient, and redact_pii to strip PHI from stored transcripts—are available alongside Medical Mode, on the same models. You're not assembling a separate compliance stack.

Frequently asked questions

Do I have to migrate off my current model to use Medical Mode?

No. Medical Mode is a domain: "medical-v1" setting on Universal-3 Pro and Universal-3.5 Pro Realtime. If you're already on either, you add one parameter—no model swap, no re-integration, no new key.


Does Medical Mode work for live transcription?

Yes. It works on Universal-3.5 Pro Realtime with the same flag, so ambient scribes and telehealth pipelines get clinical entity tuning at streaming latency.


Which languages does Medical Mode support?

English, Spanish, German, and French, for both pre-recorded and streaming audio.


How much does it add to my bill?

$0.15/hr on top of the base model. That's $0.36/hr for Universal-3 Pro async and $0.60/hr for Universal-3.5 Pro Realtime.


Can I use keyterms prompting and Medical Mode together?

Yes. Medical Mode covers broad clinical vocabulary; keyterms cover your facility-specific terms—up to 100 free mid-stream on streaming, up to 1,000 on async for an additional $0.05/hr.

Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
Medical