New Universal-3.5 Pro Realtime is here. Learn more
Speaker diarization

Speaker diarization accurate enough for production

Word-level speaker labels for streaming and async audio, with 66% fewer false speakers and 97% fewer phantom turns than the previous generation.

Word-level attribution

Every word carries a speaker label, even through overlap and interruptions, so no misattributed words corrupt your transcripts.

Streaming and async

Label speakers in real time for live calls or in batch for recordings, from the same API with the same accuracy.

95+ languages, at scale

Consistent speaker labels across 95+ languages, from a two-person call to a 30-speaker meeting.

Delphi
Happy Scribe
Supernormal

2x

increase in free-to-paid conversion rate

Runway
Ashby
Jiminny
JotPsych
Fireflies

“The new Universal-3.5 Pro speech model from AssemblyAI is best so far in terms of accuracy, latency, and language switching.”

EdgeTier
Genio
Super
Loop
Calabrio

80%

increase in customer satisfaction

Commure
Dovetail
Granola

“Assembly has saved us countless hours managing models, and provided exceptional accuracy.”

Retell
Ashby
CallRail
JotPsych
JotPsych

75%

engineering time savings on infrastructure

Delphi
Happy Scribe
Supernormal

2x

increase in free-to-paid conversion rate

Runway
Ashby
Jiminny
JotPsych
Fireflies

“The new Universal-3.5 Pro speech model from AssemblyAI is best so far in terms of accuracy, latency, and language switching.”

EdgeTier
Genio
Super
Loop
Calabrio

80%

increase in customer satisfaction

Commure
Dovetail
Granola

“Assembly has saved us countless hours managing models, and provided exceptional accuracy.”

Retell
Ashby
CallRail
JotPsych
JotPsych

75%

engineering time savings on infrastructure

Quickstart

See who said what, minutes after you sign up

Create a free account and add speaker labels to any audio with one flag. Test it in the no-code playground, then copy a ready-made request into your app.

Start building free

No credit card required

Diarization error rate

Diarization error rate combines missed speech, false alarms, and speaker confusion against a reference—the standard measure of who-spoke-when accuracy.

Diarization error rate on multi-speaker telephony.

*Lower is better*
AssemblyAI
9.20%
ElevenLabs Scribe
12.60%
Deepgram Nova 3
13.80%
Speechmatics
15.10%

Source: AssemblyAI published benchmarks, March 2026.