Speech-to-Text

Experience industry-leading speech-to-text accuracy with Speech AI models on the cutting-edge of AI research, accessible through a simple API.

Call Transcript (04.02.2024)

Thank you for calling Acme Corporation, Sarah speaking. How may I assist you today? Hi Sarah, this is John. I’m having trouble with my Acme Widget. It seems to be malfunctioning. I’m sorry to hear that, John. Let’s get that sorted out for you. Could you please provide me with the serial number of your widget? Thank you, John. Now, could you describe the issue you’re experiencing with your widget? Well, it’s not turning on at all, even though I’ve replaced the batteries. Let’s try a few troubleshooting steps. Have you checked if the batteries are inserted correctly? Yes, I’ve double-checked that.

Universal-1

State-of-the-art multilingual speech-to-text model

>92.5%

Accuracy*

30.4s

Latency on 30 min audio file

12.5M

Hours of multilingual training data

Industry’s lowest Word Error Rate (WER)

See how Universal-1 performs against other Automatic Speech Recognition providers.

Read our research

0%

4%

8%

12%

AssemblyAI

OpenAI

Azure

Deepgram

AWS

Google

See it in action

Learn more

*Benchmark performed across 11 datasets, including 8 academic datasets & 3 internally curated datasets representing real world English audio.

Harness best-in-class accuracy and powerful Speech AI capabilities

Streaming Speech-to-Text

Transcribe audio streams synchronously with high accuracy and low latency.

Speech Understanding

Extract maximum value from voice data with Audio Intelligence, and leverage Large Language Models with LeMUR.

START BUILDING WITH AI

Get started in seconds

1
2
3
4
5
6
import assemblyai as aai

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)

print(transcript)
{
  "id": "6rlr37h8f4-e310-4e23-bbf3-ea5f347dc684",
  "language_code": "en_us",
  "status": "completed",
  "text": "Runner's knee is a condition characterized by pain behind or around the kneecap...",
  "confidence": 0.98122,
  "audio_duration": 3200,
  "words": [
    { "text": "Runner's", "start": 0, "end": 550, "speaker": "A", "confidence": 0.98113 },
    { "text": "knee", "start": 580, "end": 1130, "speaker": "A", "confidence": 0.95417 }
  ]
}