Releases & Updates
March 25, 2026

Inroducing Medical Mode

Today we're releasing Medical Mode, built to deliver significantly better accuracy on medical terminology.

Madison Bernstein
Product Marketing
Reviewed by
No items found.
Table of contents

Today we're releasing Medical Mode, built to deliver significantly better accuracy on medical terminology.

Why general-purpose ASR falls short in clinical settings

Speech recognition has become remarkably accurate. On most audio, the best models produce transcripts that are essentially correct. But "essentially correct" has a specific failure mode in medicine: the errors concentrate on the words that matter most – medicine names, dosage and clinical diagnosis.

A general-purpose model will transcribe a 30-minute clinical consult at 95%+ accuracy overall while getting "hydrochlorothiazide" wrong every time. This is a worse problem than it appears. Most healthcare AI products — AI scribes, ambient documentation, clinical summarizers — feed transcripts into an LLM to produce structured output: SOAP notes, discharge summaries, referral letters. If the transcript says the wrong drug name, the LLM faithfully structures the wrong drug name. Errors don't attenuate through the pipeline. They propagate.

Medical Mode breaks the chain by catching the errors before they get the chance to propagate.

How it works

Medical Mode applies a correction pass optimized for medical entity recognition – drug names, procedures, clinical terminology. The base model's characteristics (noise handling, accent robustness, latency) are unchanged. Medical Mode refines what comes out.

Drug name and concentration

Drug name and concentration:
0:00
0:00

The correct spelling is Humalog, a specific brand of rapid-acting insulin where the concentration designation U-100 is critical for safe insulin administration.

Drug nomenclature

Drug nomenclature:
0:00
0:00

Lispro Humalog" is phonetically correct, but it overlooks a small yet important medical writing convention: the generic name should come first, with the brand name following in parentheses, written as Lispro (Humalog).

Medical Mode pushes transcription accuracy even further across medical entities, building on Universal-3 Pro's already strong handling of dates, times, drug names, and dosages.

Medical time formatting

Medical time formatting:
0:00
0:00

The correct format is 0800, with a numeric zero, following the universal healthcare convention of using 24-hour time notation to eliminate AM/PM confusion and prevent medication timing errors.

Medical Mode works across both Universal-3 Pro and Universal-3 Pro Streaming. Activation is a single parameter:

Pre-recorded:

base_url = "https://api.assemblyai.com"
headers = {"authorization": "<YOUR_API_KEY>"}

data = {
   "audio_url": "https://assembly.ai/lispro",
   "language_detection": True,
   "speech_models": ["universal-3-pro", "universal-2"],
   "domain": "medical-v1"
}

Streaming:

YOUR_API_KEY = "YOUR-API-KEY"  # Replace with your actual API key

CONNECTION_PARAMS = {
   "sample_rate": 16000,
   "speech_model": "u3-rt-pro",
   "domain": "medical-v1"
}


Measuring What Matters

Word Error Rate (WER) is the standard metric for speech recognition, but it has a fundamental limitation: it treats all words equally. A missed filler word like “um” carries the same penalty as transcribing “hydrochlorothiazide” as “hydrocortisone.” A model can achieve excellent WER while getting every drug name wrong.

We evaluate Medical Mode with two methods designed for clinical transcription:

1. Side-by-side quality comparison using LLM judge

Side-by-side comparison by human transcriptionists is the gold standard for quality assessment, and we scale this by using a powerful LLM (Anthropic’s Claude Sonnet 4.6 with extended thinking). The LLM is presented with two anonymized transcripts of the same audio — one from Medical Mode, one from a competing medical transcription model — in random order, alongside the ground-truth reference. The LLM judges which transcript is of higher quality as a medical transcript, or declares a tie. We then calculate Medical Mode’s win rate across all samples.

2. Missed Entity Rate (MER) on medical terms

MER measures accuracy specifically on medical entities — drug names, conditions, procedures — not overall word accuracy. A transcript can have excellent WER and still be unusable for clinical workflows if it gets the medications wrong. MER captures exactly that.

Available in 4 languages

Medical Mode is available in English, Spanish, German, and French for both pre-recorded and streaming STT, with enhanced medical recognition accuracy in each. When you know the spoken language, specify it via the language_code parameter to ensure the best possible accuracy. Otherwise, let the model detect the language automatically.

Uplift from Medical Mode

Medical Mode reduces the Missed Entity Rate across the board — pre-recorded and streaming, English and multilingual.

Clinical-grade accuracy, enterprise-grade compliance 

The compliance stack is included: HIPAA BAA for any customer who needs it, SOC 2 Type 2, ISO 27001:2022, and PCI DSS v4.0. Pay for what you use, nothing more.

Medical Mode is $0.15/hr as an add-on. With total costs a fraction of what Amazon Transcribe Medical ($4.50/hr) and AWS HealthScribe ($6.00/hr) charge, and better MER accuracy than Deepgram, you get clinical-grade transcription without tradeoffs.

Built for how healthcare teams work

Ambient clinical documentation: Medical Mode on our streaming model means terminology arrives correct in real time — not patched after the fact. For any ambient scribing workflow where a clinician needs an accurate transcript during or immediately after a consult, this is the difference between a tool they trust and one they don't.

AI-powered clinical notes and summaries: Pre-recorded workflows generating SOAP notes, discharge summaries, and referral letters need clean input. Medical Mode reduces the error rate on the terminology that downstream LLMs use to structure clinical output.

Front-office automation: Scheduling calls, insurance verification, front-desk voice agents — these all encounter drug names, provider names, and clinic-specific terminology. Medical Mode handles it without custom keyterm lists.

Multi-speaker clinical conversations: Combine with speaker diarization for provider/patient separation. Telehealth, therapy documentation, any setting where who said what matters as much as what was said.

Get started

Medical Mode is available now.

Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
Medical