Industry’s most accurate Speech AI models
Examine the performance of our Speech AI models across key metrics including accuracy, word error rate, and more.
80%
90%
95%
AssemblyAI
OpenAI
Microsoft
Deepgram
Amazon
Dataset | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 | Microsoft Azure Batch v3.1 | Deepgram Nova 2 | Amazon Amazon Transcribe | Google Latest-long |
---|---|---|---|---|---|---|
English | 92.7% | 91.6% | 90.6% | 90.5% | 89.4% | 86.4% |
Spanish | 95.2% | 94.0% | 92.8% | 92.4% | 94.2% | 91.3% |
German | 92.5% | 91.6% | 91.1% | 88.8% | 91.1% | 87.3% |
Average across all datasets
0%
4%
8%
12%
AssemblyAI
OpenAI
Microsoft
Deepgram
Amazon
Dataset | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 | Microsoft Azure Batch v3.1 | Deepgram Nova 2 | Amazon Amazon Transcribe | Google Latest-long |
---|---|---|---|---|---|---|
English | 7.3% | 8.4% | 9.4% | 9.5% | 10.6% | 13.6% |
Spanish | 4.8% | 6.0% | 7.2% | 7.6% | 5.8% | 8.7% |
German | 7.5% | 8.4% | 8.9% | 11.2% | 8.9% | 12.7% |
Average across all datasets
0%
10%
15%
20%
AssemblyAI
Whisper Large-v3
Metrics | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 |
---|---|---|
Fabrications | 5.2% | 8.8% |
Omissions | 5.6% | 7.1% |
Hallucinations | 12.9% | 18.4% |
Average across all datasets
See the difference
The examples below show the difference between Universal-1's performance and Whisper Large-v3's hallucinations.
Ground-truth | AssemblyAI Universal-1 | OpenAI Whisper Large-V3 |
---|---|---|
her jewelry shimmered | her jewelry shimmering | hadja luis sima addjilu sime subtitles by the amara org community |
the taebaek mountain chain is often considered the backbone of the korean peninsula | the tabet mountain chain is often considered the backbone of the korean venezuela | the ride to price inte i daseline is about 3 feet tall and suites sizes is 하루 |
the englishman said nothing | there's an englishman said nothing | does that mean we should not have interessant n |
not in a month of sundays | marine a month of sundays | this time i am very happy and then thank you to my co workers get them back to jack corn again thank you to all of you who supported me the job you gave me ultimately gave me nothing however i thank all of you for supporting me thank you to everyone at jack corn thank you to michael john song trabalhar significant |
English Word Error Rate per dataset
Dataset | AssemblyAI Universal-2 | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 | Deepgram Nova 2 | Microsoft Azure Batch v3.1 | Amazon Amazon Transcribe | Google Latest-long |
---|---|---|---|---|---|---|---|
CommonVoice v5.1 | 7.08% | 7.32% | 8.83% | 12.43% | 7.81% | 8.98% | 17.59% |
Meanwhile | 4.77% | 4.58% | 9.75% | 5.56% | 6.73% | 7.27% | 11.67% |
Noisy | 10.19% | 10.31% | 11.86% | 16.03% | 16.13% | 27.62% | 26.63% |
Podcast | 8.25% | 8.48% | 8.93% | 9.12% | 9.74% | 10.55% | 14.22% |
Telephony (internal) | 11.02% | 10.78% | 12.55% | 13.22% | 16.04% | 16.08% | 23.83% |
LibriSpeech Clean | 1.71% | 1.95% | 2.22% | 3.13% | 2.74% | 2.87% | 6.46% |
LibriSpeech Test-Other | 3.27% | 3.49% | 4.09% | 7.36% | 6.15% | 6.64% | 13.20% |
Broadcast (internal) | 4.06% | 4.87% | 4.55% | 5.53% | 6.01% | 5.92% | 8.61% |
Earnings 2021 | 9.39% | 9.80% | 9.55% | 11.05% | 7.58% | 8.33% | 14.56% |
CORAAL | 16.74% | 17.92% | 19.08% | 16.86% | 17.97% | 19.62% | 33.71% |
TEDLIUM | 7.02% | 7.25% | 7.30% | 8.98% | 9.27% | 9.12% | 11.69% |
Average | 7.59% | 7.89% | 8.97% | 9.93% | 9.65% | 11.18% | 16.56% |
Spanish Word Error Rate per dataset
Dataset | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 | Microsoft Azure Batch v3.1 | Deepgram Nova 2 | Amazon Amazon Transcribe | Google Latest-long |
---|---|---|---|---|---|---|
CommonVoice v9 | 3.4% | 5.0% | 7.5% | 8.0% | 4.7% | 7.1% |
Private | 4.6% | 6.6% | 8.9% | 8.2% | 6.2% | 13.7% |
Multilingual LS | 3.3% | 5.7% | 5.7% | 5.8% | 3.4% | 7.2% |
Voxpopuli | 7.5% | 9.6% | 8.8% | 9.3% | 8.6% | 10.7% |
Fleurs | 5.0% | 2.8% | 4.9% | 7.0% | 6.2% | 5.0% |
Average | 4.8% | 6.0% | 7.2% | 7.6% | 5.8% | 8.7% |
German Word Error Rate per dataset
Dataset | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 | Microsoft Azure Batch v3.1 | Deepgram Nova 2 | Amazon Amazon Transcribe | Google Latest-long |
---|---|---|---|---|---|---|
CommonVoice v9 | 4.2% | 6.0% | 7.4% | 9.4% | 5.8% | 10.1% |
Private | 7.4% | 8.7% | 9.5% | 11.1% | 9.7% | 12.1% |
Voxpopuli | 12.2% | 12.6% | 14.7% | 15.1% | 16.8% | 17.4% |
Fleurs | 10.2% | 7.5% | 7.2% | 12.4% | 9.0% | 12.1% |
Multilingual LS | 3.6% | 7.4% | 5.7% | 8.2% | 3.2% | 12.0% |
Average | 7.5% | 8.4% | 8.9% | 11.2% | 8.9% | 12.7% |
Benchmark Report Methodology
Datasets
This benchmark was performed using 3 open-source datasets (LibriSpeech, Rev16, and Meanwhile) and 4 in-house datasets curated by AssemblyAI. For the in-house datasets, we sourced 60+ hours of human-labeled audio data covering popular speech domains such as call centers, podcasts, broadcasts, and webinars. Collectively, these datasets comprise a diverse set of English audio that spans phone calls, broadcasts, accented speech, and heavy jargon.
Methodology
We measured the performance of each provider on 7 datasets. For each vendor, we made API calls to their most accurate model for each file, and for Whisper we generated outputs using a self-hosted instance of Whisper-Large-V3. After receiving results for each file from each vendor, we normalized both the prediction from the model and the ground truth transcript using the open-source Whisper Normalizer. From there, we calculated the average metrics for each file across datasets to measure performance.
AssemblyAI combines industry-leading accuracy with a robust feature set
Language Detection
Speaker Labels
Word Timings
Real-time Streaming
Custom Vocabulary
Dual Channel
LLM Text Generation
Profanity Filtering
Speech Threshold
Advanced PII Redaction
Extract maximum value from voice data
AssemblyAI’s models deliver the highest accuracy and an expansive set of capabilities, making it easy to build on top of voice data.
Rapid innovation, constant iteration
AssemblyAI ships new features, model improvements, and product updates weekly to ensure you have access to cutting-edge Speech AI capabilities.
Submit your email to download
a PDF of the benchmark results
Or get started in seconds with the AssemblyAI APITrusted by thousands of developers in businesses across every industry
"With AssemblyAI, we get a trusted partner that supports us in enhancing the value we deliver for our customers."
Ryan Johnson, Chief Product Officer, CallRail
It’s fast and free to start building with AssemblyAI
1
2
3
4
5
6
import assemblyai as aai
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)
print(transcript)