INTRODUCING medical mode

Clinical-grade accuracy on every drug name, dose, and diagnosis

20% fewer missed entities on the terminology that affects patient outcomes — across real-time and async workflows.

Try medication names (ibuprofen, metformin, amoxicillin), dosage instructions, procedure names, and anatomical terms. Take a few steps away from your device to mimic an ambient environment.

Medical Mode in Universal-3 Pro Streaming
Clinical evaluation history:
00:00
01:59
"prompt": "Produce a transcript for a clinical history evaluation. It's important to capture medication and dosage accurately. Every disfluency is meaningful data. Include: fillers (um, uh, er, erm, ah, hmm, mhm, like, you know, I mean), repetitions (I I I, the the), restarts (I was- I went), stutters (th-that, b-but, no-not), and informal speech (gonna, wanna, gotta)"
Without prompting

"I just want to move you along a bit further. Do you take any prescribed medicines? I know you've got diabetes and high blood pressure. I do. I take Ramipril. Okay. And I take Metformin, and there's another one that begins with G for the diabetes.  Glicoside."

With context aware prompting

"I just wanna move you along a bit further. Do you take any prescribed medicines? I know you've got diabetes and high blood pressure. I, I do. I take, um, I take Ramipril. Okay, mhm. And I take Metformin, and there's another one that begins with G for the diabetes. So glycosi — glycosi— glycoside."

Non-speech audio event:
00:00
01:59
"prompt": "Produce a transcript suitable for conversational analysis. Every disfluency is meaningful data. Include: Tag sounds: [beep]"
Without audio tagging

"Your call has been forwarded to an automatic voice message system. At the tone, please record your message. When you have finished recording, you may hang up or press 1 for more options."

With audio tagging

"Your call has been forwarded to an automatic voice message system. At the tone, please record your message. When you have finished recording, you may hang up or press 1 for more options. [beep]"

Speech with disfluencies:
00:00
01:59
"prompt": "Produce a transcript suitable for conversational analysis. Every disfluency is meaningful data. Include: fillers (um, uh, er, ah, hmm, mhm, like, you know, I mean), repetitions (I I, the the), restarts (I was- I went), stutters (th-that, b-but, no-not), and informal speech (gonna, wanna, gotta)"
Without disfluency prompting

Do you and Quentin still socialize when you come to Los Angeles, or is it like he's so used to having you here? No, no, no, we're friends. What do you do with him?

With disfluency prompting

Do you and Quentin still socialize, uh, when you come to Los Angeles, or is it like he's so used to having you here? No, no, no, we, we, we're friends. What do you do with him?

Proper noun spelling:
00:00
01:59
"keyterms_prompt": ["Kelly Byrne-Donoghue"]
Without keyterms prompting

"Hi, this is Kelly Byrne Donahue"

Without keyterms prompting

"Hi, this is Kelly Byrne-Donahue"

Caputuring speaker roles:
00:00
01:59
"prompt": "Produce a transcript with every disfluency data. Additionally, label speakers with their respective roles. 1. Place [Speaker:role] at the start of each speaker turn. Example format: [Speaker:NURSE] Hello there. How can I help you today? [Speaker:PATIENT] I'm feeling unwell. I have a headache."}
With traditional speaker labels

Speaker A: 5Mg. And do you take it regularly?

Speaker B: Oh yeah, yeah.

Speaker  A: Good.

Speaker B: Every evening.

Speaker A: And no side effects with it?

With speaker labels prompting

Speaker [Nurse]: 5Mg. And do you take it regularly?

Speaker [Patient]: Oh yeah, yeah.

Speaker  [Nurse]: Good.

Speaker [Patient]: Every evening.

Speaker [Nurse]: And no side effects with it?

Spanish and english audio:
00:00
01:59
"language_detection": True
"prompt": Preserve natural code-switching between English and Spanish. Retain spokenlanguage as-is (correct "I was hablando con mi manager").
Without codeswitching

Would definitely think I spoke Spanish if you heard me speak Spanish. But I still make mistakes. Soy wines. Paltro Soy. La fundadora de goop. Thank you. Thank you for doing that.

With codeswitching

You would definitely think I spoke Spanish if you heard me speak Spanish, but I still make mistakes. Soy Gwyneth Paltrow, soy la fundadora de Goop. Thank you. Thank you for doing that.

Industry-leading accuracy, now with medical-grade precision

Medical Mode reduces missed medical entities by over 20% compared to Universal-3 Pro alone.

Missed Entity Rate: Universal-3 Pro vs. Universal-3 Pro with Medical Mode

Lower is better  ·  % of entities not correctly transcribed

Universal-3 Pro with
Medical Mode
Universal-3 Pro

Pre-recorded English

3.24%

3.95%

18% Improvement

OpenAI

Microsoft

4.22%

4.93%

14% Improvement

Real-time English

OpenAI

Deepgram

9.28%

10.98%

Pre-recorded Non-English

15% Improvement

More accurate on medical terms than every other provider

The terms that determine patient outcomes — medication names, dosages, and diagnoses — transcribed more accurately than ever.

MER & WER across medical transcription models

Lower is better  ·  % of entities not correctly transcribed

MER (Missed Entity Rate)
WER (Word Error Rate)

AssemblyAI Universal-3
Pro w/ Medical Mode

3.20%

5.30%

Deepgram

3.60%

5.50%

Speechmatics
Enhanced Medical

Deepgram

4.70%

6.10%

Deepgram Nova-3
Medical

Deepgram

8.70%

5.90%

AWS Transcribe
Medical

OpenAI

Microsoft

24.40%

12.90%

Google Medical
Conversation

See the performance on your own files

Reach out to our Applied AI team to run latency and accuracy benchmarks on your own data.

Built for the nuances of patient encounters

Every capability engineered for real conversations in ambient, far field, and multi-speaker healthcare settings.

Far-field accuracy, without the tradeoffs

Drug names, procedures, dosages — transcribed correctly the first time, even in noisy rooms.

  • Capture every medication, procedure, and dosage correctly — 88% fewer medical entity errors than general-purpose models
  • Handle the noise of real care settings — equipment, overlapping voices, and multi-speaker encounters without accuracy tradeoffs
  • Perform across every specialty without retraining — oncology, cardiology, primary care, and everything in between, out of the box

Compliant, affordable, and built to scale

HIPAA-eligible infrastructure, BAA included, and $0.15/hr. No compliance tax, no surprises.

  • Go live for $0.15/hr — transparent add-on pricing with no compliance upcharges
  • Ship with compliance already handled — HIPAA-eligible infrastructure and BAA included, data training opted out by default
  • Scale without contracts or hidden overages — no lock-in, no concurrency limits, and predictable, usage based pricing

The full Voice AI stack, with medical accuracy built in

Speaker diarization, real-time streaming, PHI redaction, all with medical domain accuracy.

  • Separate every voice in the encounter — provider, patient, and staff accurately identified across the full visit
  • Generate EHR-ready output automatically — PHI stripped, SOAP structured, and ready for your downstream systems
  • Stream medical-grade accuracy live — ambient scribes and clinical copilots get terms right as they're spoken

More on Medical Mode

What's next

We’ll be releasing new languages and improvements to Medical Mode over the coming weeks.

Read the blog

Playground

Access our production-ready models for speech recognition, speaker detection, audio summarization, and more—all in our no-code playground.

Try our Playground

Start Building

Explore our comprehensive developer docs with use case templates and best practices to optimize latency and accuracy for your application.

Read the docs

Unlock the value of voice data

Build what’s next on the platform powering thousands of the industry’s leading of Voice AI apps.