Speech-to-Text

Experience industry-leading speech-to-text accuracy with Speech AI models on the cutting-edge of AI research, accessible through a simple API.

Use our API Contact sales

Call Transcript (04.02.2024)

Thank you for calling Acme Corporation, Sarah speaking. How may I assist you today? Hi Sarah, this is John. I’m having trouble with my Acme Widget. It seems to be malfunctioning. I’m sorry to hear that, John. Let’s get that sorted out for you. Could you please provide me with the serial number of your widget? Thank you, John. Now, could you describe the issue you’re experiencing with your widget? Well, it’s not turning on at all, even though I’ve replaced the batteries. Let’s try a few troubleshooting steps. Have you checked if the batteries are inserted correctly? Yes, I’ve double-checked that.

Universal-1

State-of-the-art multilingual speech-to-text model

>92.5%

Accuracy^*

30.4s

Latency on 30 min audio file

12.5M

Hours of multilingual training data

Industry’s lowest Word Error Rate (WER)

See how Universal-1 performs against other Automatic Speech Recognition providers.

Read our research

12%

AssemblyAI

OpenAI

Azure

Deepgram

AWS

Google

See it in action

Learn more

*Benchmark performed across 11 datasets, including 8 academic datasets & 3 internally curated datasets representing real world English audio.

Harness best-in-class accuracy and powerful Speech AI capabilities

Async Speech-to-Text

The AssemblyAI API can transcribe pre-recorded audio and/or video files in seconds, with human-level accuracy. Highly scalable to tens of thousands of files in parallel.

See how in docs

Custom Vocabulary

Boost accuracy for vocabulary that is unique or custom to your specific use case or product.

See how in docs

Speaker Diarization

Detect the number of speakers in your audio file, with each word in the text associated with its speaker.

See how in docs

International Language Support

Gain support to transcribe over 99+ languages and counting, including Global English (English and all of its accents).

See how in docs

Auto Punctuation and Casing

Automatically add casing and punctuation of proper nouns to the transcription text.

See how in docs

Confidence Scores

Get a confidence score for each word in the transcript.

See how in docs

Word Timings

View word-by-word timestamps across the entire transcript text.

See how in docs

Filler Words

Optionally include disfluencies in the transcripts of your audio files.

See how in docs

Profanity Filtering

Detect and replace profanity in the transcription text with ease.

See how in docs

Automatic Language Detection

Automatically detect if the dominant language of the spoken audio is supported by our API and route it to the appropriate model for transcription.

See how in docs

Custom Spelling

Specify how you would like certain words to be spelled or formatted in the transcription text.

See how in docs

See everything in docs

Continuously up-to-date and secure

Monthly updates and improvements

View weekly product and accuracy improvements in our changelog.

View changelog

Enterprise-grade security

AssemblyAI is committed to the highest standards of security practices to keep your data and your customers' data safe.

AssemblyAI's accuracy is better than any other tools in the market (and we have tried them all).

Vedant Maheshwari, Co-Founder and CEO

Explore more

Streaming Speech-to-Text

Transcribe audio streams synchronously with high accuracy and low latency.

Learn more

Speech Understanding

Extract maximum value from voice data with Audio Intelligence, and leverage Large Language Models with LeMUR.

Learn more

START BUILDING WITH AI

Get started in seconds

Use our API Contact sales

import assemblyai as aai
import json

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)

print(json.dumps(transcript, indent=2))

{
  "id": "6rlr37h8f4-e310-4e23-bbf3-ea5f347dc684",
  "language_code": "en_us",
  "status": "completed",
  "text": "Runner's knee is a condition characterized by pain behind or around the kneecap...",
  "confidence": 0.98122,
  "audio_duration": 3200,
  "words": [
    { "text": "Runner's", "start": 0, "end": 550, "speaker": "A", "confidence": 0.98113 },
    { "text": "knee", "start": 580, "end": 1130, "speaker": "A", "confidence": 0.95417 }
  ]
}