Pricing built for innovation

Start free, pay-as-you-go after that – no commitments required.

Pre-recorded Speech-to-Text API

Build Voice AI on the most accurate Speech-to-Text with language detection, formatting, filler words, keyterms prompting, custom spelling, word-level timestamps, and more.

Models	Pay as you go	Custom
Universal-3 Pro Our most accurate speech-to-text model, leading the market in multilingual accuracy on WER, entities, rare words, alphanumerics, and messy speech in real-world audio. Currently supports English, Spanish, German, French, Italian, and Portuguese with more languages coming soon.	$0.21 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Universal-2 Our highly accurate speech-to-text model trained on over 12.5 million hours of audio data. Supports 99 languages. Exceptional accuracy at a lower price.	$0.15 /hr

Models

Pay as you go

Universal-3 Pro

Our most accurate speech-to-text model, leading the market in multilingual accuracy on WER, entities, rare words, alphanumerics, and messy speech in real-world audio. Currently supports English, Spanish, German, French, Italian, and Portuguese with more languages coming soon.

$0.21 /hr

Universal-2

Our highly accurate speech-to-text model trained on over 12.5 million hours of audio data. Supports 99 languages. Exceptional accuracy at a lower price.

$0.15 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Add-on features	Universal-3 Pro	Universal-2
Keyterms Prompting Provide up to 1000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.	$0.05 /hr	Included
Prompting Beta Control transcription behavior with plain language instructions: provide context, tag audio events, and more.	$0.05 /hr	Not supported
Speaker Diarization Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.	$0.02 /hr	$0.02 /hr
Medical Mode New Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.	$0.15 /hr	$0.15 /hr

Add-on features

Keyterms Prompting

Provide up to 1000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.

Universal-3 Pro $0.05 /hr

Universal-2 Included

Prompting Beta

Control transcription behavior with plain language instructions: provide context, tag audio events, and more.

Universal-3 Pro $0.05 /hr

Universal-2 Not supported

Speaker Diarization

Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.

Universal-3 Pro $0.02 /hr

Universal-2 $0.02 /hr

Medical Mode New

Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.

Universal-3 Pro $0.15 /hr

Universal-2 $0.15 /hr

Streaming Speech-to-Text API

Transcribe live audio and video files in real-time at ultra-low latency and high-quality accuracy. Leverage auto punctuation and casing, next-gen end-of-turn detection, and ITM/formatting.

Models	Pay as you go	Custom
Universal-3 Pro Streaming New The most accurate model for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities. Supports English, Spanish, German, French, Portuguese, and Italian.	$0.45 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Universal-Streaming The fastest model for real-time English transcription. Optimized for speed and cost-effectiveness for English-only applications.	$0.15 /hr
Universal-Streaming Multilingual Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian.	$0.15 /hr
Whisper-Streaming Open-source Whisper model enhanced with AssemblyAI's reliable infrastructure and unlimited scale. Supports 99+ languages at an accessible price point.	$0.30 /hr

Models

Pay as you go

Universal-3 Pro Streaming New

The most accurate model for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities. Supports English, Spanish, German, French, Portuguese, and Italian.

$0.45 /hr

Universal-Streaming

The fastest model for real-time English transcription. Optimized for speed and cost-effectiveness for English-only applications.

$0.15 /hr

Universal-Streaming Multilingual

Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian.

$0.15 /hr

Whisper-Streaming

Open-source Whisper model enhanced with AssemblyAI's reliable infrastructure and unlimited scale. Supports 99+ languages at an accessible price point.

$0.30 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Add-on features	Universal-3 Pro Streaming	Universal-Streaming
Keyterms Prompting Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.	Included	$0.04 /hr
Speaker Diarization Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.	$0.12 /hr	$0.12 /hr
Prompting Beta Control transcription behavior with plain language instructions: provide context, tag audio events, and more.	$0.05 /hr	Not supported
Medical Mode Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.	$0.15 /hr	$0.15 /hr

Add-on features

Keyterms Prompting

Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.

Universal-3 Pro Streaming Included

Universal-Streaming $0.04 /hr

Speaker Diarization

Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.

Universal-3 Pro Streaming $0.12 /hr

Universal-Streaming $0.12 /hr

Prompting Beta

Control transcription behavior with plain language instructions: provide context, tag audio events, and more.

Universal-3 Pro Streaming $0.05 /hr

Universal-Streaming Not supported

Medical Mode

Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.

Universal-3 Pro Streaming $0.15 /hr

Universal-Streaming $0.15 /hr

Voice Agent API

A proprietary Voice AI stack, built end-to-end for production voice agents. Every layer tuned for how people actually talk—on top of the most accurate STT models in the industry.

Models	Pay as you go	Custom
Voice Agent API The fastest path to a working voice agent, built on our industry-leading streaming speech-to-text.	$4.50/hr ($0.075/min)	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us

Models

Pay as you go

Voice Agent API

The fastest path to a working voice agent, built on our industry-leading streaming speech-to-text.

$4.50/hr ($0.075/min)

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Speech Understanding

AI models that extract meaning from your transcripts. Identify speakers by name, detect sentiment, surface topics, generate summaries, and more.

Models	Pay as you go	Custom
Speaker Identification Identify speakers by their actual names or roles	$0.02 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Translation Convert your content from one language to another	$0.06 /hr
Custom Formatting Standardize and format specific types of information	$0.03 /hr
Entity Detection Identify entities that are spoken, such as names or email addresses	$0.08 /hr
Sentiment Analysis Detect the sentiment of each sentence spoken	$0.02 /hr
Auto Chapters Generate a summary over time for audio and video files	$0.08 /hr
Key Phrases Identify significant words and phrases	$0.01 /hr
Topic Detection Label the topics spoken in standardized IAB taxonomy	$0.15 /hr
Summarization Generate a summary of audio files at scale	$0.03 /hr

Models

Pay as you go

Speaker Identification

Identify speakers by their actual names or roles

$0.02 /hr

Translation

Convert your content from one language to another

$0.06 /hr

Custom Formatting

Standardize and format specific types of information

$0.03 /hr

Entity Detection

Identify entities that are spoken, such as names or email addresses

$0.08 /hr

Sentiment Analysis

Detect the sentiment of each sentence spoken

$0.02 /hr

Auto Chapters

Generate a summary over time for audio and video files

$0.08 /hr

Key Phrases

Identify significant words and phrases

$0.01 /hr

Topic Detection

Label the topics spoken in standardized IAB taxonomy

$0.15 /hr

Summarization

Generate a summary of audio files at scale

$0.03 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Guardrails

Guardrails ensures only high-quality, safe, and compliant content flows through your applications.

Models	Pay as you go	Custom
Profanity Filtering Filter out profanity from your transcripts	$0.01 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
PII Audio Redaction Identify and remove PII from the audio file before it is returned to you	$0.05 /hr
PII Text Redaction Identify and remove PII from the transcription text before it is returned to you	$0.08 /hr
Content Moderation Detect sensitive content in your audio and video files	$0.15 /hr

Models

Pay as you go

Profanity Filtering

Filter out profanity from your transcripts

$0.01 /hr

PII Audio Redaction

Identify and remove PII from the audio file before it is returned to you

$0.05 /hr

PII Text Redaction

Identify and remove PII from the transcription text before it is returned to you

$0.08 /hr

Content Moderation

Detect sensitive content in your audio and video files

$0.15 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

LLM Gateway

Apply powerful language models directly to your audio data through a single API. Ask questions, generate insights, and build custom workflows all without managing LLM infrastructure.

Models	Input	Output	Custom
GPT-5.5	$5.00 / 1M	$30.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
GPT-5.2	$1.75 / 1M	$14.00 / 1M
GPT-5.1	$1.25 / 1M	$10.00 / 1M
Claude 4.7 Opus	$5.50 / 1M	$27.50 / 1M
Claude 4.6 Sonnet	$3.00 / 1M	$15.00 / 1M

Models

GPT-5.5

Input $5.00 / 1M

Output $30.00 / 1M

GPT-5.2

Input $1.75 / 1M

Output $14.00 / 1M

GPT-5.1

Input $1.25 / 1M

Output $10.00 / 1M

Claude 4.7 Opus

Input $5.50 / 1M

Output $27.50 / 1M

Claude 4.6 Sonnet

Input $3.00 / 1M

Output $15.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Models	Input	Output	Custom
GPT-5.5	$5.00 / 1M	$30.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
GPT-5.2	$1.75 / 1M	$14.00 / 1M
GPT-5.1	$1.25 / 1M	$10.00 / 1M
GPT-5	$1.25 / 1M	$10.00 / 1M
GPT-5-Mini	$0.25 / 1M	$2.00 / 1M
GPT-5 Nano	$0.05 / 1M	$0.40 / 1M
GPT 4.1	$2.00 / 1M	$8.00 / 1M
gpt-oss-20b	$0.07 / 1M	$0.30 / 1M
gpt-oss-120b	$0.15 / 1M	$0.60 / 1M

Models

GPT-5.5

Input $5.00 / 1M

Output $30.00 / 1M

GPT-5.2

Input $1.75 / 1M

Output $14.00 / 1M

GPT-5.1

Input $1.25 / 1M

Output $10.00 / 1M

GPT-5

Input $1.25 / 1M

Output $10.00 / 1M

GPT-5-Mini

Input $0.25 / 1M

Output $2.00 / 1M

GPT-5 Nano

Input $0.05 / 1M

Output $0.40 / 1M

GPT 4.1

Input $2.00 / 1M

Output $8.00 / 1M

gpt-oss-20b

Input $0.07 / 1M

Output $0.30 / 1M

gpt-oss-120b

Input $0.15 / 1M

Output $0.60 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Models	Input	Output	Custom
Claude 4.6 Sonnet	$3.00 / 1M	$15.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Claude 4.5 Sonnet	$3.00 / 1M	$15.00 / 1M
Claude 4.5 Haiku	$1.00 / 1M	$5.00 / 1M
Claude 4 Sonnet	$3.00 / 1M	$15.00 / 1M
Claude 4.7 Opus	$5.50 / 1M	$27.50 / 1M
Claude 4.6 Opus	$5.00 / 1M	$25.00 / 1M
Claude 4.5 Opus	$5.00 / 1M	$25.00 / 1M
Claude 4 Opus	$15.00 / 1M	$75.00 / 1M

Models

Claude 4.6 Sonnet

Input $3.00 / 1M

Output $15.00 / 1M

Claude 4.5 Sonnet

Input $3.00 / 1M

Output $15.00 / 1M

Claude 4.5 Haiku

Input $1.00 / 1M

Output $5.00 / 1M

Claude 4 Sonnet

Input $3.00 / 1M

Output $15.00 / 1M

Claude 4.7 Opus

Input $5.50 / 1M

Output $27.50 / 1M

Claude 4.6 Opus

Input $5.00 / 1M

Output $25.00 / 1M

Claude 4.5 Opus

Input $5.00 / 1M

Output $25.00 / 1M

Claude 4 Opus

Input $15.00 / 1M

Output $75.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Models	Input	Output	Custom
Gemini 3 Flash	$0.50 / 1M	$3.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Gemini 2.5 Flash	$0.30 / 1M	$2.50 / 1M
Gemini 2.5 Flash Lite	$0.10 / 1M	$0.40 / 1M
Gemini 2.5 Pro	$1.25 / 1M	$10.00 / 1M

Models

Gemini 3 Flash

Input $0.50 / 1M

Output $3.00 / 1M

Gemini 2.5 Flash

Input $0.30 / 1M

Output $2.50 / 1M

Gemini 2.5 Flash Lite

Input $0.10 / 1M

Output $0.40 / 1M

Gemini 2.5 Pro

Input $1.25 / 1M

Output $10.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Models	Input	Output	Custom
Qwen3 Next 80B A3B	$0.15 / 1M	$1.20 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Qwen3 32B	$0.15 / 1M	$0.60 / 1M
Kimi K2.5	$0.60 / 1M	$3.00 / 1M

Models

Qwen3 Next 80B A3B

Input $0.15 / 1M

Output $1.20 / 1M

Qwen3 32B

Input $0.15 / 1M

Output $0.60 / 1M

Kimi K2.5

Input $0.60 / 1M

Output $3.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Looking for volume-based pricing?

We’ll build a solution tailored to your needs.

Talk to our team

Security

Security and privacy

AssemblyAI uses enterprise-grade security practices to keep your data safe. We approach security by design and default, and continuously ensure AssemblyAI is secure for you and your team.

Learn more

Playground

We’re not playing around, but you can

Put our Voice AI models to the test in our no-code playground.

Try it out

Frequently asked questions

: We have speech to text models available for both pre-recorded audio and live transcription settings. For pre-recorded, Universal-3 Pro is our most accurate model, delivering best-in-class transcription across a wide range of audio types and languages. Universal-2 offers excellent accuracy at a lower price point. Our streaming models are optimized for real-time use cases, with Universal-3 Pro Streaming delivering the highest accuracy and Universal-Streaming providing a cost-effective option.
: Yes. You can create an account and start transcribing immediately with no credit card required. The free tier includes up to 185 hours of pre-recorded transcription and up to 333 hours of streaming transcription.
: Yes. We offer custom pricing for customers with high-volume usage. Contact our sales team to discuss tiered pricing, committed-use discounts, and enterprise agreements tailored to your needs.
: On the free plan, you can open up to 5 new streaming connections per minute. On the pay-as-you-go plan, you get unlimited concurrent streams. Custom rate limits are also available for enterprise customers.
: We bill monthly based on your actual usage. There are no minimum commitments, upfront fees, or contracts on the pay-as-you-go plan. Invoices are generated at the start of each month for the previous month’s usage.
: Multichannel audio is billed per channel. For example, a 1-hour stereo (2-channel) file is billed as 2 hours of transcription. Each channel is transcribed independently, which provides more accurate results for multi-speaker recordings.
: Yes. AssemblyAI is available on the AWS Marketplace, allowing you to consolidate billing through your existing AWS account. Contact our sales team for details on setting this up.
: AssemblyAI supports transcription in over 30 languages across our models. Universal-3 Pro and Universal-3 Pro Streaming offer the broadest language support. Visit our documentation for the complete list.
: A token is a unit of text used by large language models (LLMs) in the LLM Gateway. Tokens roughly correspond to word fragments — on average, one English word equals about 1.3 tokens. LLM Gateway pricing is based on the number of input and output tokens processed by the selected model.