customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Start free, pay-as-you-go after that – no commitments required.
Build Voice AI on the most accurate Speech-to-Text with language detection, formatting, filler words, keyterms prompting, custom spelling, word-level timestamps, and more.
| Models | Pay as you go | Custom |
|---|---|---|
| Universal-3 Pro Our most accurate speech-to-text model, leading the market in multilingual accuracy on WER, entities, rare words, alphanumerics, and messy speech in real-world audio. Currently supports English, Spanish, German, French, Italian, and Portuguese with more languages coming soon. | $0.21 /hr | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| Universal-2 Our highly accurate speech-to-text model trained on over 12.5 million hours of audio data. Supports 99 languages. Exceptional accuracy at a lower price. | $0.15 /hr |
Models
Pay as you go
Our most accurate speech-to-text model, leading the market in multilingual accuracy on WER, entities, rare words, alphanumerics, and messy speech in real-world audio. Currently supports English, Spanish, German, French, Italian, and Portuguese with more languages coming soon.
Our highly accurate speech-to-text model trained on over 12.5 million hours of audio data. Supports 99 languages. Exceptional accuracy at a lower price.
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact us| Add-on features | Universal-3 Pro | Universal-2 |
|---|---|---|
| Keyterms Prompting Provide up to 1000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy. | $0.05 /hr | Included |
| Prompting Beta Control transcription behavior with plain language instructions: provide context, tag audio events, and more. | $0.05 /hr | Not supported |
| Speaker Diarization Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said. | $0.02 /hr | $0.02 /hr |
| Medical Mode New Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy. | $0.15 /hr | $0.15 /hr |
Add-on features
Provide up to 1000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.
Control transcription behavior with plain language instructions: provide context, tag audio events, and more.
Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.
Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.
Transcribe live audio and video files in real-time at ultra-low latency and high-quality accuracy. Leverage auto punctuation and casing, next-gen end-of-turn detection, and ITM/formatting.
| Models | Pay as you go | Custom |
|---|---|---|
| Universal-3 Pro Streaming New The most accurate model for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities. Supports English, Spanish, German, French, Portuguese, and Italian. | $0.45 /hr | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| Universal-Streaming The fastest model for real-time English transcription. Optimized for speed and cost-effectiveness for English-only applications. | $0.15 /hr | |
| Universal-Streaming Multilingual Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian. | $0.15 /hr | |
| Whisper-Streaming Open-source Whisper model enhanced with AssemblyAI's reliable infrastructure and unlimited scale. Supports 99+ languages at an accessible price point. | $0.30 /hr |
Models
Pay as you go
The most accurate model for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities. Supports English, Spanish, German, French, Portuguese, and Italian.
The fastest model for real-time English transcription. Optimized for speed and cost-effectiveness for English-only applications.
Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian.
Open-source Whisper model enhanced with AssemblyAI's reliable infrastructure and unlimited scale. Supports 99+ languages at an accessible price point.
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact us| Add-on features | Universal-3 Pro Streaming | Universal-Streaming |
|---|---|---|
| Keyterms Prompting Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy. | Included | $0.04 /hr |
| Speaker Diarization Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said. | $0.12 /hr | $0.12 /hr |
| Prompting Beta Control transcription behavior with plain language instructions: provide context, tag audio events, and more. | $0.05 /hr | Not supported |
| Medical Mode Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy. | $0.15 /hr | $0.15 /hr |
Add-on features
Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.
Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.
Control transcription behavior with plain language instructions: provide context, tag audio events, and more.
Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.
A proprietary Voice AI stack, built end-to-end for production voice agents. Every layer tuned for how people actually talk—on top of the most accurate STT models in the industry.
| Models | Pay as you go | Custom |
|---|---|---|
| Voice Agent API The fastest path to a working voice agent, built on our industry-leading streaming speech-to-text. | $4.50/hr ($0.075/min) | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
Models
Pay as you go
The fastest path to a working voice agent, built on our industry-leading streaming speech-to-text.
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact usAI models that extract meaning from your transcripts. Identify speakers by name, detect sentiment, surface topics, generate summaries, and more.
| Models | Pay as you go | Custom |
|---|---|---|
| Speaker Identification Identify speakers by their actual names or roles | $0.02 /hr | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| Translation Convert your content from one language to another | $0.06 /hr | |
| Custom Formatting Standardize and format specific types of information | $0.03 /hr | |
| Entity Detection Identify entities that are spoken, such as names or email addresses | $0.08 /hr | |
| Sentiment Analysis Detect the sentiment of each sentence spoken | $0.02 /hr | |
| Auto Chapters Generate a summary over time for audio and video files | $0.08 /hr | |
| Key Phrases Identify significant words and phrases | $0.01 /hr | |
| Topic Detection Label the topics spoken in standardized IAB taxonomy | $0.15 /hr | |
| Summarization Generate a summary of audio files at scale | $0.03 /hr |
Models
Pay as you go
Identify speakers by their actual names or roles
Convert your content from one language to another
Standardize and format specific types of information
Identify entities that are spoken, such as names or email addresses
Detect the sentiment of each sentence spoken
Generate a summary over time for audio and video files
Identify significant words and phrases
Label the topics spoken in standardized IAB taxonomy
Generate a summary of audio files at scale
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact usGuardrails ensures only high-quality, safe, and compliant content flows through your applications.
| Models | Pay as you go | Custom |
|---|---|---|
| Profanity Filtering Filter out profanity from your transcripts | $0.01 /hr | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| PII Audio Redaction Identify and remove PII from the audio file before it is returned to you | $0.05 /hr | |
| PII Text Redaction Identify and remove PII from the transcription text before it is returned to you | $0.08 /hr | |
| Content Moderation Detect sensitive content in your audio and video files | $0.15 /hr |
Models
Pay as you go
Filter out profanity from your transcripts
Identify and remove PII from the audio file before it is returned to you
Identify and remove PII from the transcription text before it is returned to you
Detect sensitive content in your audio and video files
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact usApply powerful language models directly to your audio data through a single API. Ask questions, generate insights, and build custom workflows all without managing LLM infrastructure.
| Models | Input | Output | Custom |
|---|---|---|---|
| GPT-5.5 | $5.00 / 1M | $30.00 / 1M | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| GPT-5.2 | $1.75 / 1M | $14.00 / 1M | |
| GPT-5.1 | $1.25 / 1M | $10.00 / 1M | |
| Claude 4.7 Opus | $5.50 / 1M | $27.50 / 1M | |
| Claude 4.6 Sonnet | $3.00 / 1M | $15.00 / 1M |
Models
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact us| Models | Input | Output | Custom |
|---|---|---|---|
| GPT-5.5 | $5.00 / 1M | $30.00 / 1M | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| GPT-5.2 | $1.75 / 1M | $14.00 / 1M | |
| GPT-5.1 | $1.25 / 1M | $10.00 / 1M | |
| GPT-5 | $1.25 / 1M | $10.00 / 1M | |
| GPT-5-Mini | $0.25 / 1M | $2.00 / 1M | |
| GPT-5 Nano | $0.05 / 1M | $0.40 / 1M | |
| GPT 4.1 | $2.00 / 1M | $8.00 / 1M | |
| gpt-oss-20b | $0.07 / 1M | $0.30 / 1M | |
| gpt-oss-120b | $0.15 / 1M | $0.60 / 1M |
Models
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact us| Models | Input | Output | Custom |
|---|---|---|---|
| Claude 4.6 Sonnet | $3.00 / 1M | $15.00 / 1M | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| Claude 4.5 Sonnet | $3.00 / 1M | $15.00 / 1M | |
| Claude 4.5 Haiku | $1.00 / 1M | $5.00 / 1M | |
| Claude 4 Sonnet | $3.00 / 1M | $15.00 / 1M | |
| Claude 4.7 Opus | $5.50 / 1M | $27.50 / 1M | |
| Claude 4.6 Opus | $5.00 / 1M | $25.00 / 1M | |
| Claude 4.5 Opus | $5.00 / 1M | $25.00 / 1M | |
| Claude 4 Opus | $15.00 / 1M | $75.00 / 1M |
Models
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact us| Models | Input | Output | Custom |
|---|---|---|---|
| Gemini 3 Flash | $0.50 / 1M | $3.00 / 1M | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| Gemini 2.5 Flash | $0.30 / 1M | $2.50 / 1M | |
| Gemini 2.5 Flash Lite | $0.10 / 1M | $0.40 / 1M | |
| Gemini 2.5 Pro | $1.25 / 1M | $10.00 / 1M |
Models
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact us| Models | Input | Output | Custom |
|---|---|---|---|
| Qwen3 Next 80B A3B | $0.15 / 1M | $1.20 / 1M | Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us |
| Qwen3 32B | $0.15 / 1M | $0.60 / 1M | |
| Kimi K2.5 | $0.60 / 1M | $3.00 / 1M |
Models
Custom
Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.
Contact usPut our Voice AI models to the test in our no-code playground.
2x
increase in free-to-paid conversion rate
“We needed a provider that could scale with us — offering unlimited concurrent streams, fair pricing, and responsive support.”
80%
increase in customer satisfaction
“Assembly has saved us countless hours managing models, and provided exceptional accuracy.”
75%
engineering time savings on infrastructure
2x
increase in free-to-paid conversion rate
“We needed a provider that could scale with us — offering unlimited concurrent streams, fair pricing, and responsive support.”
80%
increase in customer satisfaction
“Assembly has saved us countless hours managing models, and provided exceptional accuracy.”
75%
engineering time savings on infrastructure