Getting started

Models

AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.

Choosing the right model

Slam-1

  • Best for: English content requiring highest accuracy
  • Key benefits:
    • Superior accuracy for English content
    • Fine-tuning support
    • Ideal for domain-specific terminology

Universal

  • Best for: Production-ready transcription out of the box
  • Key benefits:
    • Excellent accuracy-to-latency ratio
    • Multi-language support
    • No configuration needed
    • Ideal for conversational intelligence

Breakdown of Universal language support

English, Spanish, French, German, Indonesian, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Turkish, Ukrainian, Catalan

Arabic, Azerbaijani, Bulgarian, Bosnian, Mandarin Chinese, Czech, Danish, Greek, Estonian, Finnish, Filipino, Galician, Hindi, Croatian, Hungarian, Korean, Macedonian, Malay, Norwegian BokmÄl, Romanian, Slovak, Swedish, Thai, Urdu, Vietnamese, Cantonese

Afrikaans, Belarusian, Welsh, Persian (Farsi), Hebrew, Armenian, Icelandic, Kazakh, Lithuanian, Latvian, Māori, Marathi, Slovenian, Swahili, Tamil

Amharic, Assamese, Bengali, Gujarati, Hausa, Javanese, Georgian, Khmer, Kannada, Luxembourgish, Lingala, Lao, Malayalam, Mongolian, Maltese, Burmese, Nepali, Occitan, Punjabi, Pashto, Sindhi, Shona, Somali, Serbian, Telugu, Tajik, Uzbek, Yoruba

Streaming

  • Best for: Voice agents and real-time voice applications
  • Key benefits:
    • ~300ms immutable transcripts
    • Continuous speech recognition
    • Intelligent endpointing
    • Ideal for voice agents and interactive applications

Pricing

For detailed pricing information, visit our pricing page.

ModelPrice per HourVolume discounts
Universal$0.27/hrAvailable
Slam-1$0.27/hrAvailable
Streaming$0.15/hrAvailable

For volume discounts, please reach out to sales@assemblyai.com.

Next steps