Industry-leading transcription accuracy
Transcription accuracy
Available languages
Hours of training data
Conformer-2: a state-of-the-art speech recognition model
Conformer-2 is our latest AI model for automatic speech recognition. Conformer-2 is trained on 1.1M hours of English audio data, extending Conformer-1 to provide improvements on proper nouns, alphanumerics, and robustness to noise.
Improvement on Proper Noun Error Rate
Improvement on alphanumerics
Improvement in robustness to noise
See how Conformer-2 works
Is it going to be a first world championship for Verstappen? Is it going to be an 8th world championship for Lewis Hamilton? Where can Verstappen try and get past Hamilton? First overtaking zone is normally down into turn five. Is verstappen far enough back. He's going to make the lunge down the inside.
Hamilton sees it coming. It's a late lunch by Verstappen who takes the lead of the race. Verstappen now slatches the championship trophy from Lewis Hamilton who's trying to fight back.
No DRS for two laps, so Lewis Hamilton will not get the rear wing open. Now he's going to go down the outside if Verstappen keeps it tight and neat. But he hasn't.
He's gone a little bit wide.
Every feature needed to transcribe audio
Async Transcription
The AssemblyAI API can transcribe pre-recorded audio and/or video files in seconds, with human-level accuracy. Highly scalable to tens of thousands of files in parallel.
Custom Vocabulary
Boost accuracy for vocabulary that is unique or custom to your specific use case or product.
Speaker diarization
Automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker.
International language support
Gain support to transcribe over 16 languages and counting, including Global English (English and all of its accents).
Automatic punctuation and casing
Automatically add casing and punctuation of proper nouns to the transcription text.
Confidence scores
Get a confidence score for each word in the transcript.
Word timings
Word-by-word timestamps across the entire transcript text.
Filler words
Optionally include disfluencies in the transcripts of your audio files.
Profanity filtering
Automatically detect and replace profanity in the transcription text.
Automatic language detection
Automatically detect if the dominant language of the spoken audio is supported by our API and route it to the appropriate model for transcription.
Custom spelling
Specify how you would like certain words to be spelled or formatted in the transcription text.
Weekly product and accuracy improvements
LATEST UPDATES
- Pricing decreases
- Significant Summarization model speedups
- Introducing LeMUR, the easiest way to build LLM apps on spoken data