Getting started

Models

AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.

Choosing the right model

Slam-1

  • Best for: English content requiring highest accuracy
  • Key benefits:
    • Superior accuracy for English content
    • Fine-tuning support
    • Ideal for domain-specific terminology

Universal

  • Best for: Production-ready transcription out of the box
  • Key benefits:
    • Excellent accuracy-to-latency ratio
    • Multi-language support
    • No configuration needed
    • Ideal for conversational intelligence

Nano

  • Best for: Cost-sensitive applications with broad language needs
  • Key benefits:
    • Most cost-effective option
    • Widest language support

Streaming

  • Best for: Voice agents and real-time voice applications
  • Key benefits:
    • ~300ms immutable transcripts
    • Continuous speech recognition
    • Intelligent endpointing
    • Ideal for voice agents and interactive applications

Pricing

For detailed pricing information, visit our pricing page.

ModelPrice per MinuteVolume discounts
Universal$0.27/hrAvailable
Slam-1$0.27/hrAvailable
Nano$0.12/hrAvailable
Streaming$0.15/hrAvailable

For volume discounts, please reach out to sales@assemblyai.com.

Next steps