Streaming Speech-to-Text

Convert live audio streams into text synchronously with nearly 90% accuracy and <600ms latency.

LIVE

01:27:19 PM

Hey, adventurers! Welcome to today's exciting livestream event, where we're embarking on an expedition to uncover the secrets of the Lost Temple hidden deep within this mysterious jungle! I'm your host, Emily, and I'm thrilled to have you all joining me on this epic adventure. Just look at this incredible jungle landscape, teeming with life and brimming with secrets waiting to be discovered! Who knows what ancient mysteries lie within these dense foliage? [Camera zooms in on a vine-covered ruin peeking through the trees] Emily: And there it is, folks! Our destination, the Lost Temple, a relic of a long-forgotten civilization lost to time. Legend has it that this temple holds untold riches and powerful artifacts beyond imagination!

Automatically turn live audio into text

Transcribe conversations, meetings, and live events synchronously and elevate live interactions instantly.

Try in the Playground

An illustration of the AssemblyAI realtime playground. On top, there's a button with the Text "Start talking". Below, there's a timestamp and output with text "Hello today is"

Industry-leading quality at low latency

Low latency

Automatically transcribe live audio, nearly instantaneously, with customized end point control.

Industry-leading quality

Retrieve highly accurate results.

High concurrency

Easily process a high volume of audio files at scale.

Advanced punctuation & casing

Automatically add casing and punctuation of proper nouns to the transcription text.

Feature-rich Streaming Speech-to-Text

Streaming Transcription

Transcribe live audio with high accuracy and low latency.

See how in docs

Auto Punctuation and Casing

Automatically add casing and punctuation of proper nouns to the transcription text.

See how in docs

Custom Vocabulary

Boost accuracy for vocabulary that is unique or custom to your specific use case or product.

See how in docs

ITN/Formatting

Automatically convert spoken form text into its proper written format to increase transcript readability.

See how in docs

End of Utterance Detection

Customize End of Utterance Detection to more accurately detect when one speaker finishes an utterance in Streaming Speech-to-Text.

See how in docs

See everything in docs

Continuously up-to-date and secure

Monthly updates and improvements

View weekly product and accuracy improvements in our changelog.

View changelog

Enterprise-grade security

AssemblyAI is committed to the highest standards of security practices to keep your data and your customers' data safe.

Everything in statistics comes down to garbage in and garbage out. So depending on the quality of your natural language processing and your speech-to-text, that’s going to impact the quality of your analysis

Mike Adams, CEO & Co-founder at Grain

Explore more

Speech-to-Text

Build on top of the most accurate Speech-to-Text model on the market with >92.5% accuracy.

Learn more

Speech Understanding

Extract maximum value from voice data with Audio Intelligence, and leverage Large Language Models with LeMUR.

Learn more

START BUILDING WITH AI

Get started in seconds

Use our API Contact sales

import assemblyai as aai
import json

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)

print(json.dumps(transcript, indent=2))

{
  "id": "6rlr37h8f4-e310-4e23-bbf3-ea5f347dc684",
  "language_code": "en_us",
  "status": "completed",
  "text": "Runner's knee is a condition characterized by pain behind or around the kneecap...",
  "confidence": 0.98122,
  "audio_duration": 3200,
  "words": [
    { "text": "Runner's", "start": 0, "end": 550, "speaker": "A", "confidence": 0.98113 },
    { "text": "knee", "start": 580, "end": 1130, "speaker": "A", "confidence": 0.95417 }
  ]
}