Audio Intelligence

AI models to summarize speech, detect hateful content, spoken topics, and more.

Explore popular models

Why AssemblyAI

Get more from your audio data
Accelerate your roadmap and quickly build valuable features and products for your users.
Built on the latest AI breakthroughs
Access AI models for understanding speech that are built on the latest breakthroughs in Transformers and Large Language Models.
Securely analyze your data at scale
Easily access our state-of-the-art AI models with our secure SOC 2 and GDPR-certified API.

All Audio Intelligence models

Leverage our AI-powered Summarization models to automatically summarize audio/video data in your products at scale. Customize the summary types to best fit your use case.
Sentiment Analysis
With Sentiment Analysis, AssemblyAI can detect the sentiment of each sentence of speech spoken in your audio files.
Entity Detection
Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.
Auto Chapters
Automatically generate a summary over time for audio and video files.
Content Moderation
Detect sensitive content in your audio and video files - such as hate speech, violence, sensitive social issues, alcohol, drugs, and more.
PII Redaction
Identify and remove Personally Identifiable Information, such as phone numbers and social security numbers, from the transcription text before it is returned to you.
Detect Important Phrases and Words
Automatically detect important phrases and words in your transcription text.
Topic Detection
Label the topics that are spoken in your audio and video files. The predicted topic labels follow the standardized IAB Taxonomy, which makes them suitable for contextual targeting.
Get state-of-the-art transcription

Convert your audio and video files, and live audio streams, into text automatically with advanced AI models using our simple Speech-to-Text APIs.

Async Transcription
Transcribe pre-recorded audio and/or video files in seconds, with human-level accuracy. Highly scalable to tens of thousands of files in parallel.
Real-Time Transcription
Stream your audio data in real-time, and we will stream transcripts back to you within a few hundred milliseconds.
