Products Overview | AssemblyAI

New Universal-3.5 Pro is here. Learn more: Async Realtime

customers

All customer stories

Top Voice AI companies are building with Assembly.

See all stories

Zoom leverages AssemblyAI to help advance its AI research and development.

Read the story

Siro achieves a 90% reduction in customer complaints and support tickets.

Watch the video

resources

Documentation API Reference Cookbooks Support Changelog Status

Latest Release

Universal 3.5 Pro

Our new flagship speech-to-text model handles real-world audio the way it actually happens. Available for real-time and pre-recorded audio.

Realtime Pre-recorded

resources

Blog Partners Research Benchmarks Security

Platform overview

The Voice AI infrastructure for every workflow

Try our API for free

Production Voice AI from a single API: models, intelligence, deployment.

How does your audio arrive?

Your audio type determines the right product.

Pre-recorded

Pre-recorded Speech-to-Text API

Explore

Turn raw audio into structured, actionable intelligence. Purpose-built models extract meaning from speech in a single API call.

Universal-3.5 Pro

Universal-2

Speaker diarization Natural language prompting Key terms 99+ languages PII reduction LLM-powered insights

Add-on

Medical Mode

Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy. Available on both Universal-3.5 Pro and Universal-2 models. HIPAA and BAA available.

Learn more

Beyond transcription

Turn transcripts into structured intelligence

Add modular intelligence and safety layers on top of any transcript.

Speech Understanding API

Extract structured insights from speech without building custom NLP pipelines.

Speaker diarization Audio events Sentiment analysis Entity detection Topic detection Translation

LLM Gateway

Send transcripts directly to GPT, Claude, Gemini, or open-source models via a single API.

Automatic fallbacks 0% markup Gemini 2.5 Flash Qwen3 32B 20+ models

Guardrails

Compliance-grade safety controls at the transcription layer.

PII text redaction PII audio redaction Content moderation Profanity filtering

Infrastructure

Voice AI infrastructure that scales with you

Run on AssemblyAI's managed cloud or deploy on your own infrastructure. Same models, same API.

Managed

AssemblyAI Cloud

Production-grade Voice AI infrastructure with automatic scaling from zero to thousands of concurrent streams. No GPU provisioning, no capacity planning.

Explore

Self-hosted

Self-Hosted Voice AI

Run AssemblyAI models on your infrastructure for data sovereignty, reduced network latency, and full stack control.

Explore

What teams build

Purpose-built for the hardest Voice AI problems

The same API powers voice agents, clinical documentation, meeting notes, and contact centers at scale.

Voice Agents

Entity-accurate real-time transcription with turn detection and short-utterance handling — the model stack that wins competitive voice agent evals.

Realtime Speech-to-Text API Voice Agent API

AI Notetakers

Highest accuracy with speaker diarization, custom output formatting via prompting, and LLM Gateway for automatic summaries, chapters, and action items.

Pre-recorded Speech-to-Text API LLM Gateway

AI Scribe

Ambient clinical documentation powered by Medical Mode — ~20% reduction in missed entities on drug names, conditions, and procedures. HIPAA BAA available in minutes.

Medical Mode

Conversation Intelligence

Turn every customer conversation into structured data — sentiment analysis, entity detection, topic classification, and key phrases extracted automatically from transcripts.

Pre-recorded Speech-to-Text API Speech Understanding API

Agent Assist

Real-time streaming transcription that powers live agent coaching, suggested responses, and compliance monitoring during active customer calls.

Realtime Speech-to-Text API LLM Gateway

Call Analytics

Post-call transcription with speaker diarization, sentiment tracking, and LLM-powered QA scoring. Process call recordings at scale for trends, compliance, and coaching insights.

Pre-recorded Speech-to-Text API LLM Gateway

Common questions

: We have speech to text models available for both pre-recorded audio and real-time transcription settings. For pre-recorded, Universal-3.5 Pro is our most accurate model, delivering best-in-class transcription across a wide range of audio types and languages. Universal-2 offers excellent accuracy at a lower price point. Our streaming models are optimized for real-time use cases, with Universal-3.5 Pro Realtime delivering the highest accuracy and Universal-Streaming providing a cost-effective option.
: Yes. Free tier includes up to 185 hours of pre-recorded and 333 hours of streaming. No credit card required.
: Yes, we offer volume discounts for customers processing large amounts of audio. Contact our sales team for details.
: You are billed based on the amount of audio you process each month. We offer pay-as-you-go pricing with volume discounts available.
: Try our models directly in the browser via the AssemblyAI Playground — upload audio, test features, and see results in real time.
: AssemblyAI supports 99 languages with its Universal model, covering all major world languages plus automatic language detection.

The Voice AI infrastructure for every workflow

How does your audio arrive?

Pre-recorded Speech-to-Text API

Realtime Speech-to-Text API

Voice Agent API

Sync Speech-to-Text API

Turn transcripts into structured intelligence

Speech Understanding API

LLM Gateway

Guardrails

Voice AI infrastructure that scales with you

AssemblyAI Cloud

Self-Hosted Voice AI

Purpose-built for the hardest Voice AI problems

Voice Agents

AI Notetakers

AI Scribe

Conversation Intelligence

Agent Assist

Call Analytics

Common questions