AssemblyAI's logo. A blue illustration on the left, and the word Assembly A I on the right. The illustration is the letter V inverted. Why? I don't know either. Looks cool though.
Try the API


Core Transcription

AI models to accurately convert audio files, video files, and live audio streams into text at scale.

Explore Key Features

Why AssemblyAI

Quickly process large volumes of data
Our API processes millions of audio files every day for hundreds of customers, including dozens of Fortune 500 enterprises.
Built on the latest AI breakthroughs
Access AI models for transcribing speech that are built on the latest breakthroughs in Transformers and large AI models trained on enormous amounts of data.
Securely analyze your data at scale
Easily access our state-of-the-art AI models with our secure SOC 2 and GDPR-certified API.

All Core Transcription features

Async Transcription
Transcribe pre-recorded audio and/or video files in seconds, with human-level accuracy. Highly scalable to tens of thousands of files in parallel.
Learn more
Real-Time Transcription
If you're working with live audio streams, you can stream your audio data in real-time. We will stream transcripts back to you within a few hundred milliseconds, and additionally, revise these transcripts with more accuracy over time as more context arrives.
Learn more
Custom Vocabulary
Boost accuracy for vocabulary that is unique or custom to your specific use case or product.
Learn more
Speaker Labels
The AssemblyAI API can automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker.
Learn more
International Language Support
We support over 12 languages and counting, including Global English (English and all of its accents).
Learn more
All Audio and Video Formats Accepted
Don't worry about file formats or sampling rates, our API supports virtually all audio and video files without any transcoding required.
Learn more
Automatic Punctuation and Casing
Casing and punctuation of proper nouns are automatically added to the transcription text.
Learn more
Confidence Scores
Get a confidence score for each word in the transcript.
Learn more
Word Timings
Word-by-word timestamps across the entire transcript text.
Learn more
Paragraph Detection
Export your transcription broken down into automatically generated paragraphs.
Learn more
Export as Captions
Easily export your transcription in SRT or VTT format, to be plugged into a video player for subtitles and closed captions.
Learn more
Dual-Channel Transcription
The API can split your dual-channel audio files and provide a transcription for each unique channel.
Learn more
Language Detection
Automatically detect if the dominant language of the spoken audio is supported by our API and route it to the appropriate model for transcription.
Learn more
Filler Words
Optionally include disfluencies in the transcripts of your audio files.
Learn more
Profanity Filtering
Automatically detect and replace profanity in the transcription text.
Learn more
Word Search
Search over your completed transcripts for specific words and phrases.
Learn more
Privacy Protection
Files sent to the API for transcription are never stored, and you can request the deletion of transcription text permanently from our database.
Learn more
Custom Spelling
Specify how you would like certain words to be spelled or formatted in the transcription text.
Learn more

Want to do more with your audio?

Explore our Audio Intelligence models that can summarize speech, detect spoken topics, and more.

Learn more about Audio Intelligence
Leverage our AI-powered Summarization models to automatically summarize audio/video data in your products at scale.
Learn more
Entity Detection
Identify a wide range of entities that are spoken in your files, such as person and company names, email addresses, and locations.
Learn more

All with one simple API

Built for developers


A colorful square