Models
AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.
Highest accuracy for transcribing English pre-recorded audio with fine-tuning support and customization via prompting
Best for out-of-the-box transcription of pre-recorded audio with multi-lingual support, excellent accuracy, and low latency
Most cost-effective transcription for pre-recorded audio with broad language support
Streaming audio transcription optimized for voice agents and real-time applications
Choosing the right model
Slam-1
- Best for: English content requiring highest accuracy
- Key benefits:
- Superior accuracy for English content
- Fine-tuning support
- Ideal for domain-specific terminology
Universal
- Best for: Production-ready transcription out of the box
- Key benefits:
- Excellent accuracy-to-latency ratio
- Multi-language support
- No configuration needed
- Ideal for conversational intelligence
Nano
- Best for: Cost-sensitive applications with broad language needs
- Key benefits:
- Most cost-effective option
- Widest language support
Streaming
- Best for: Voice agents and real-time voice applications
- Key benefits:
- ~300ms immutable transcripts
- Continuous speech recognition
- Intelligent endpointing
- Ideal for voice agents and interactive applications
Pricing
For detailed pricing information, visit our pricing page.
For volume discounts, please reach out to sales@assemblyai.com.
Next steps
- For pre-recorded audio, see how to select your model
- For real-time transcription, check out our streaming documentation