See how AssemblyAI performs on your audio

Request a customized benchmark across AssemblyAI models and alternative providers. Our team will be in touch shortly.

  • Customized testing using your actual audio files and real-world scenarios
  • Detailed methodology documentation so you understand exactly how results were generated
  • Clear visualization of performance differences that matter to your business
  • BAA and NDA available upon request
Thank you for your contacting us.

Our team will be in touch shortly.

Oops! Something went wrong while submitting the form.
>93%

Accuracy*

99+

Available languages

12.5M

Hours of multilingual data

<300ms

Streaming latency

Pricing without the lock-in

Get started for free. $0.15/hr after that – no commitments, no minimums, no credit card to get started.

Pre-recorded Speech-to-Text API

Build Voice AI on the most accurate Speech-to-Text with language detection, formatting, filler words, keyterms prompting, custom spelling, word-level timestamps, and more.
Models
Pay as you go
Custom
Universal-3 Pro

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Our most accurate speech-to-text model, leading the market in multilingual accuracy on WER, entities, rare words, alphanumerics, and messy speech in real-world audio. Currently supports English, Spanish, German, French, Italian, and Portuguese with more languages coming soon.

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

$0.21/hr
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
Universal-2

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Our highly accurate speech-to-text model trained on over 12.5 million hours of audio data. Supports 99 languages. Exceptional accuracy at a lower price.

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

$0.15/hr
ADD-ON FEATURES
Universal-3 Pro
Universal-2
Keyterms Prompting

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Provide up to 1000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.
$0.05/hr
Included
Prompting
Beta
Control transcription behavior with plain language instructions: provide context, tag audio events, and more.
$0.05/hr
Not supported
Speaker Diarization

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.
$0.02/hr
$0.02/hr
Medical Mode

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

New
Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.
$0.15/hr
$0.15/hr

Ready to A/B test in production?

Get started today with $50 in free credits. No credit card required.

Streaming Speech-to-Text API

Transcribe live audio and video files in real-time at ultra-low latency and high-quality accuracy. Leverage auto punctuation and casing, next-gen end-of-turn detection, and ITM/formatting.
Models
Pay as you go
Custom
Universal-3 Pro Streaming
New

Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.

The most accurate model for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities. Supports English, Spanish, German, French, Portuguese, and Italian.
$0.45/hr
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
Universal-Streaming

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.

The fastest model for real-time English transcription. Optimized for speed and cost-effectiveness for English-only applications.
$0.15/hr
Universal-Streaming Multilingual

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.

Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian.
$0.15/hr
Whisper-Streaming

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.

Open-source Whisper model enhanced with AssemblyAI's reliable infrastructure and unlimited scale. Supports 99+ languages at an accessible price point.
$0.30/hr
Add-on features
Universal-3 Pro Streaming
Universal-Streaming
Keyterms Prompting

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.
Included
$0.04/hr
Speaker Diarization

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.
$0.12/hr
$0.12/hr
Prompting
Beta
Control transcription behavior with plain language instructions: provide context, tag audio events, and more.
$0.05/hr
Not supported
Medical Mode

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

New
Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.
$0.15/hr
$0.15/hr

Ready to start building with voice data? 

Get started today with $50 in free credits. No credit card required.

Speech Understanding

AI models that extract meaning from your transcripts. Identify speakers by name, detect sentiment, surface topics, generate summaries, and more.
Models
Pay as you go
Custom
Speaker Identification

Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.

Identify speakers by their actual names or roles

$0.02/hr
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
Translation

The Translation feature automatically converts your transcribed audio content from one language to another, enabling you to reach global audiences without manual translation work.

Convert your content from one language to another

$0.06/hr
Custom Formatting

The Custom Formatting feature automatically standardizes and formats specific types of information in your transcripts, ensuring consistency across dates, phone numbers, emails, and other data types.

Standardize and format specific types of information

$0.03/hr
Entity Detection

Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.

Identify entities that are spoken, such as names or email addresses

$0.08/hr
Sentiment Analysis

With Sentiment Analysis, AssemblyAI can detect the sentiment of each sentence of speech spoken in your audio files.

Detect the sentiment of each sentence spoken

$0.02/hr
Auto Chapters

Automatically generate a summary over time for audio and video files.

Generate a summary over time for audio and video files

$0.08/hr
Key Phrases

Accurately identify significant words and phrases in your transcription, enabling you to extract the most pertinent concepts or highlights from your audio/video file.

Identify significant words and phrases

$0.01/hr
Topic Detection

Label the topics that are spoken in your audio and video files. The predicted topic labels follow the standardized IAB Taxonomy, which makes them suitable for contextual targeting.

Label the topics spoken in standardized IAB taxonomy

$0.15/hr
Summarization

Leverage our AI-powered Summarization models to automatically summarize audio/video data in your products at scale. Customize the summary types to best fit your use case.

Generate a summary of audio files at scale

$0.03/hr

Guardrails

Guardrails ensures only high-quality, safe, and compliant content flows through your applications.
Models
Pay as you go
Custom
Profanity Filtering

Automatically filter out profanity from your transcripts.

Filter out profanity from your transcripts

$0.01/hr
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
PII Audio Redaction

Identify and remove Personally Identifiable Information, such as phone numbers and social security numbers, from the audio file before it is returned to you.

Identify and remove PII from the audio file before it is returned to you

$0.05/hr
PII Text Redaction

Identify and remove Personally Identifiable Information, such as phone numbers and social security numbers, from the transcription text before it is returned to you.

Identify and remove PII from the transcription text before it is returned to you

$0.08/hr
Content Moderation

Detect sensitive content in your audio and video files - such as hate speech, violence, sensitive social issues, alcohol, drugs, and more.

Detect sensitive content in your audio and video files

$0.15/hr

Building for healthcare or finance?

Our compliance-ready plans include HIPAA BAAs, SOC 2 Type II audit reports, and dedicated data processing agreements.

LLM Gateway

Apply powerful language models directly to your audio data through a single API. Ask questions, generate insights, and build custom workflows all without managing LLM infrastructure.
Models
Input
Output
Custom
GPT-5.2

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$1.75 / 1M
$14.00 / 1M
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
GPT-5.1

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$1.25 / 1M
$10.00 / 1M
Claude 4.6 Sonnet

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$3.00 / 1M
$15.00 / 1M
Claude 4.5 Sonnet

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$3.00 / 1M
$15.00 / 1M
Gemini 3 Pro

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$2.00 / 1M
$12.00 / 1M
Models
Input
Output
Custom
GPT-5.2

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$1.75 / 1M
$14.00 / 1M
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
GPT-5.1

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$1.25 / 1M
$10.00 / 1M
GPT-5

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$1.25 / 1M
$10.00 / 1M
GPT-5-Mini

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.25 / 1M
$2.00 / 1M
GPT-5 Nano

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.05 / 1M
$0.40 / 1M
GPT 4.1

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$2.00 / 1M
$8.00 / 1M
gpt-oss-20b

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.07 / 1M
$0.30 / 1M
gpt-oss-120b

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.15 / 1M
$0.60 / 1M
ChatGPT-4o

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$5.00 / 1M
$15.00 / 1M
Models
Input
Output
Custom
Claude 4.6 Sonnet

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$3.00 / 1M
$15.00 / 1M
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
Claude 4.5 Sonnet

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$3.00 / 1M
$15.00 / 1M
Claude 4.5 Haiku

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$1.00 / 1M
$5.00 / 1M
Claude 4 Sonnet

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$3.00 / 1M
$15.00 / 1M
Claude 4.6 Opus

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$5.00 / 1M
$25.00 / 1M
Claude 4.5 Opus

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$5.00 / 1M
$25.00 / 1M
Claude 4 Opus

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$15.00 / 1M
$75.00 / 1M
Claude 3.5 Haiku

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.80 / 1M
$4.00 / 1M
Claude 3 Haiku

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.25 / 1M
$1.25 / 1M
Models
Input
Output
Custom
Gemini 3 Pro

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$2.00 / 1M
$12.00 / 1M
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
Gemini 3 Flash

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.50 / 1M
$3.00 / 1M
Gemini 2.5 Flash

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.30 / 1M
$2.50 / 1M
Gemini 2.5 Flash Lite

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.10 / 1M
$0.40 / 1M
Gemini 2.5 Pro

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$1.25 / 1M
$10.00 / 1M
Models
Input
Output
Custom
Qwen3 Next 80B A3B

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.15 / 1M
$1.20 / 1M
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads
Qwen3 32B

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.15 / 1M
$0.60 / 1M
Kimi K2.5

Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.

$0.60 / 1M
$3.00 / 1M