customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Universal 3.5 Pro Realtime
The first streaming speech-to-text model that takes the agent's question as input.
resources
Word-level speaker labels for streaming and async audio, with 66% fewer false speakers and 97% fewer phantom turns than the previous generation.
Thanks for calling support. Can I grab your order number?
Sure, it's LX-94820B, placed last Thursday.
Got it. I can see it shipped this morning.
Every word carries a speaker label, even through overlap and interruptions, so no misattributed words corrupt your transcripts.
Label speakers in real time for live calls or in batch for recordings, from the same API with the same accuracy.
Consistent speaker labels across 95+ languages, from a two-person call to a 30-speaker meeting.
2x
increase in free-to-paid conversion rate
“The new Universal-3.5 Pro speech model from AssemblyAI is best so far in terms of accuracy, latency, and language switching.”
80%
increase in customer satisfaction
“Assembly has saved us countless hours managing models, and provided exceptional accuracy.”
75%
engineering time savings on infrastructure
2x
increase in free-to-paid conversion rate
“The new Universal-3.5 Pro speech model from AssemblyAI is best so far in terms of accuracy, latency, and language switching.”
80%
increase in customer satisfaction
“Assembly has saved us countless hours managing models, and provided exceptional accuracy.”
75%
engineering time savings on infrastructure
Create a free account and add speaker labels to any audio with one flag. Test it in the no-code playground, then copy a ready-made request into your app.
Start building freeNo credit card required
Diarization error rate combines missed speech, false alarms, and speaker confusion against a reference—the standard measure of who-spoke-when accuracy.
Diarization error rate on multi-speaker telephony.
*Lower is better*Source: AssemblyAI published benchmarks, March 2026.