Core Transcription

Convert your audio/video files, and live audio streams, into text automatically with advanced AI models using our simple Speech-to-Text APIs.

Async Transcription

Transcribe pre-recorded audio and/or video files in seconds, with human level accuracy. Highly scalable to tens of thousands of files in parallel.

Real-Time Transcription

If you're working with live audio streams, you can stream your audio data in real-time. We will stream transcripts back to you within a few hundred milliseconds, and additionally, revise these transcripts with more accuracy over time as more context arrives.

Speaker Labels (Speaker Diarization)

The AssemblyAI API can automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker.

Custom Vocabulary

Boost accuracy for vocabulary that is unique or custom to your specific use case or product. Simply include a list of terms to boost in your API calls.

International Language Support

We support over 15 languages and counting, including Global English (English and all of its accents).

And more...

All Audio and Video Formats Accepted. Don't worry about file formats or sampling rates, our API supports virtually all audio/video files without any transcoding required.

Automatic Punctuation and Casing. Casing and punctuation of proper nouns are automatically added to the transcription text, to make transcripts produced by the API more readable.

Confidence Scores. Get a confidence score for each word in the transcript.

Word Timings. Word-by-word timestamps across the entire transcript text.

Paragraph Detection. Export your transcription broken down into automatically generated paragraphs.

Export as Captions (SRT/VTT). Easily export your transcription in SRT or VTT format, to be plugged into a video player for subtitles and closed captions.

Dual Channel Transcription. Working with dual channel phone call recordings? The API can split your audio files into separate channels and provide a transcription for each unique channel.

Language Detection. Our advanced AI models can automatically detect if the dominant language of the spoken audio is supported by our API and route it to the appropriate model for transcription.

Filler Words. Optionally include disfluencies like “umm” and “uhh” in the transcripts of your audio files.

Profanity Filtering. Automatically detect and replace profanity in the transcription text.

Word Search. Search over your completed transcripts for specific words and phrases.

Privacy Protection. We are not in the business of monetizing your data. Files sent to the API for transcription are never stored, and you can request the deletion of transcription text permanently from our database.

Custom Spelling. Specify how you would like certain words to be spelled or formatted in the transcription text.

Audio Intelligence

Build powerful applications with features like Summarization, Entity Detection, Sentiment Analysis, PII Redaction, and more.

