Models
AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.
Pre-recorded models
Highest accuracy across 6 languages
Advanced prompting capabilities
Keyterms prompting up to 1,000 words
Native code switching
High accuracy, low latency
Support across 99 languages
Keyterms prompting up to 200 words
Code switching
Streaming models
Highest accuracy for voice agents
Advanced prompting capabilities
Keyterms prompting up to 100 words
6 languages: en, es, pt, de, fr,
it
Multilingual real-time transcription
Fast performance, competitive pricing
Keyterms prompting up to 100 words
6 languages: en, es, pt, de, fr,
it
Fastest real-time English transcription
Optimized for speed and cost
Keyterms prompting up to 100 words
Intelligent endpointing
Open-source Whisper with AssemblyAI infrastructure
99+ languages
Automatic language detection
Unlimited scale
Choosing the right model
Pre-recorded
Universal-3 Pro
Universal-3 Pro is our most advanced transcription model, delivering state-of-the-art accuracy across 6 languages with powerful prompting capabilities. It supports prompting in plain language for tasks like context-specific transcription, verbatim output, audio tagging, and speaker diarization, giving you fine-grained control to guide transcription results. With keyterms prompting supporting up to 1,000 words, built-in code switching, and multichannel support, Universal-3 Pro is ideal for complex audio scenarios requiring the highest accuracy.
Supported languages
enesdefrptitUniversal-2
Universal 2 offers the broadest language coverage of any of our models, supporting high-accuracy transcription across 99 languages with low latency. It supports customization through keyterms prompting (up to 200 words) and includes features like multichannel support, automatic language detection, code switching, speaker diarization, and more. Universal 2 is the go-to choice when you need reliable transcription across diverse languages.
Supported languages
enen_auen_uken_usesfrdeitptnlhijazhfikoplrutrukviafsqamarhyasazbaeubebnbsbrbgmycahrcsdaetfoglkaelguhthahawhehuisidjwknkkkmlolalvlnltlbmkmgmsmlmtmimrmnnenonnocpapsfarosasrsnsdsiskslsosuswsvde_chtltgtatttethbotkuruzcyyiyoStreaming
Universal-3 Pro Streaming
The most accurate model for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities, including both keyterms prompting and native prompting. Supports native multilingual code switching, entity accuracy, and disfluency detection.
Supported languages
enesdefrptitLearn more about Universal-3 Pro Streaming
Universal-Streaming Multilingual
Multilingual transcription at the speed and cost of Universal-Streaming. Same fast performance and competitive pricing as our English model, but with expanded language coverage. Features intelligent endpointing and keyterms prompting support for up to 100 words.
Supported languages
enesdefrptitLearn more about Universal-Streaming Multilingual
Universal-Streaming English
The fastest model for real-time English transcription. Optimized for speed and cost-effectiveness for English-only applications. Features ~300ms word-by-word immutable transcripts, intelligent endpointing, and keyterms prompting support for up to 100 words.
Supported languages
enLearn more about Universal-Streaming English
Whisper Streaming
Open-source Whisper model enhanced with AssemblyAI’s reliable infrastructure and unlimited scale. Supports 99+ languages at an accessible price point with automatic language detection and non-speech tags.
Supported languages
enesfrdeitptnlhijazhfikoplrutrukviafsqamarhyasazbaeubebnbsbrbgmyyuecahrcsdaetfoglkaelguhthahawhehuisidjwknkkkmlolalvlnltlbmkmgmsmlmtmimrmnnenonnocpapsfarosasrsnsdsiskslsosuswsvtltgtatttethbotkuruzcyyiyoLearn more about Whisper Streaming
Pricing
For detailed pricing information, visit our pricing page.
Pre-recorded
Streaming
The rates shown above are offered subject to participation in our model improvement program to help us continue to provide best-in-class speech-to-text. Rates may be different for accounts that opt out of this program.
For volume discounts, please reach out to sales@assemblyai.com.
Next steps
- Explore Speech Understanding features like summarization, sentiment analysis, and more
- Learn about prompting: Universal-3 Pro prompting guide | Universal-3 Pro Streaming prompting guide