Speech-to-Text Features

Out-of-the-box features for any Speech-to-Text application, at any scale.

Core Transcription

Out-of-the-box ready features built for any project that requires Speech-to-Text.

Top-Rated Accuracy

Our API is built using the latest advances in Deep Learning, using cutting edge research from our in-house team of AI researchers.

Automatic Punctuation and Casing

Casing and punctuation of proper nouns are automatically added to the transcription text, to make transcripts produced by the API more readable.

Word Timings

Word-by-word timestamps across the entire transcript text.

Confidence Scores

Get a confidence score for each word in the transcript.

All Audio and Video Formats Accepted

Don't worry about file formats or sampling rates, our API supports virtually all audio/video files without any transcoding required.

Speaker Labels

The number of speakers in the audio can be automatically detected, and each word transcribed associated with its speaker.


Boost accuracy for your specific use case with these simple customization options.

Dual-Channel Support

Phone calls recorded in stereo/dual channel are transcribed separately, and you'll get a transcript for each channel.

Keyword Boosts

Boost accuracy for key terms and phrases like person or product names, places, and other vocabulary unique to your application.

Multiple Models for Accents

Select a customized model for Australian, UK, South African, and more dialects to boost accuracy on your data.


Get more of out your transcripts with these easy-to-use features.

Automatic Transcript Highlights

The API can automatically detect key phrases and words in your transcription text - very useful for tagging content, and providing a summary of the transcription text.

Export as Captions (SRT/VTT)

Easily export your transcription in SRT or VTT format, to be plugged into a video player for subtitles and closed captions.

PII Redaction

Automatically detect and replace sensitive data, like credit card numbers and social security numbers, in the transcription text and source audio.

Profanity Filtering

Automatically detect and replace profanity in the transcription text.

Security and Reliability

Best-practice security standards to keep your data safe, secure, and private.

Advanced Security & Privacy

We adhere to best-practice guidelines like encryption in transit and at rest.

Data Privacy

We are not in the business of monetizing your data. Files sent to the API for transcription are never stored, and you can request the deletion of transcription text permanently from our database.

99.9%+ Uptime

Thousands of developers and startups trust AssemblyAI to power core features in production, with over 99.9% uptime and transcript completion rate.

24x7 Support

We're here to work with you as much, or as little, as you'd like. Talk to us over live chat, Slack, email, or phone.

Ready to start building?

Begin testing in under 2 minutes