AssemblyAI's logo. A blue illustration on the left, and the word Assembly A I on the right. The illustration is the letter V inverted. Why? I don't know either. Looks cool though.
Try the API

EXPLORE OUR PRODUCTION-READY MODELS

Audio Intelligence

AI models to summarize speech, detect hateful content, spoken topics, and more.

Explore popular models

Why AssemblyAI

Get more from your audio data
Accelerate your roadmap and quickly build valuable features and products for your users.
Built on the latest AI breakthroughs
Access AI models for understanding speech that are built on the latest breakthroughs in Transformers and Large Language Models.
Securely analyze your data at scale
Easily access our state-of-the-art AI models with our secure SOC 2 and GDPR-certified API.

All Audio Intelligence models

Summarization
Leverage our AI-powered Summarization models to automatically summarize audio/video data in your products at scale. Customize the summary types to best fit your use case.
Learn more
Sentiment Analysis
With Sentiment Analysis, AssemblyAI can detect the sentiment of each sentence of speech spoken in your audio files.
Learn more
Entity Detection
Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.
Learn more
Auto Chapters
Automatically generate a summary over time for audio and video files.
Learn more
Content Moderation
Detect sensitive content in your audio and video files - such as hate speech, violence, sensitive social issues, alcohol, drugs, and more.
Learn more
PII Redaction
Identify and remove Personally Identifiable Information, such as phone numbers and social security numbers, from the transcription text before it is returned to you.
Learn more
Detect Important Phrases and Words
Automatically detect important phrases and words in your transcription text.
Learn more
Topic Detection
Label the topics that are spoken in your audio and video files. The predicted topic labels follow the standardized IAB Taxonomy, which makes them suitable for contextual targeting.
Learn more
Emotion Detection
Analyze the emotion in your audio files to determine if your speakers are happy, sad, angry, frustrated, and more.
Translation
Convert your transcription into 80+ other languages.
Intent Recognition
Identify the intent in your audio files.
Ad Detection
Automatically identify the start and end time of voice and video ads, while surfacing all related sponsors and offers.

Get state-of-the-art transcription

Convert your audio and video files, and live audio streams, into text automatically with advanced AI models using our simple Speech-to-Text APIs.

Learn more about Core Transcription
Async Transcription
Transcribe pre-recorded audio and/or video files in seconds, with human-level accuracy. Highly scalable to tens of thousands of files in parallel.
Learn more
Real-Time Transcription
Stream your audio data in real-time, and we will stream transcripts back to you within a few hundred milliseconds.
Learn more

All with one simple API

Built for developers

POWERED BY AI

A colorful square