Q: How does Speech Understanding convert transcripts into actionable data?

Speech Understanding applies LLMpowered tasks to completed transcripts via the LLM Gateway. You send a transcript_id and a speech_understanding.request (e.g., translation, speaker_identification, custom formatting). The service returns the original transcript augmented with structured outputs, like translated_texts and updated utterance speaker labels, so raw text becomes machinereadable, actionable fields for downstream workflows.

Question 1

How does Speech Understanding convert transcripts into actionable data?

Accepted Answer

Speech Understanding applies LLMpowered tasks to completed transcripts via the LLM Gateway. You send a transcript_id and a speech_understanding.request (e.g., translation, speaker_identification, custom formatting). The service returns the original transcript augmented with structured outputs, like translated_texts and updated utterance speaker labels, so raw text becomes machinereadable, actionable fields for downstream workflows.

Question 2

How does AssemblyAI automatically label speakers by name?

Accepted Answer

AssemblyAI first diarizes audio into speaker clusters using embeddings. Then Advanced Speaker Identification maps these clusters to real names or roles via the Speech Understanding API, using audio context and optional known_values you provide. Enable speaker_labels and request speaker_identification to return utterances labeled by name.

Question 3

Does AssemblyAI provide customizable summary types?

Accepted Answer

Yes. AssemblyAI lets you choose summary types and styles. Set summary_type to bullets, bullets_verbose, gist, headline, or paragraph, and pair it with a summary_model (informative, conversational, or catchy). If you specify one, you must specify the other. For fully custom formats, use LeMUR via LLM Gateway.

Question 4

How do I get started with Speech Understanding?

Accepted Answer

Get an AssemblyAI API key. Create a transcript (POST /v2/transcript). Add Speech Understanding either: 1) inlineinclude speech_understanding.request (e.g., translation) in the transcription request, or 2) afterPOST[https://](https://llm-gateway.assemblyai.com/v1/understanding)[llm-gateway.assemblyai.com/v1/understanding](http://llm-gateway.assemblyai.com/v1/understanding) with transcript_id and your speech_understanding.request. Tasks include translation, speaker identification, and custom formatting.

Question 5

What does Speech Understanding cost?

Accepted Answer

Speech Understanding is billed by audio duration and the models you enable. Payasyougo per hour: Speaker Identification $0.02, Translation $0.06, Custom Formatting $0.03, Entity Detection $0.08, Sentiment Analysis $0.02, Auto Chapters $0.08, Key Phrases $0.01, Topic Detection $0.15, Summarization $0.03.

Question 6

What speech to text models does speech understanding work with?

Accepted Answer

Speech Understanding runs on the transcript produced by the speech-to-text model you choose. For pre-recorded audio, it works with Universal (default) or Slam1. You can specify the model in your request, and pricing follows the model used.

Speech
Understanding

Powering the world’s most trusted Voice AI products

The accuracy and quality that Voice AI apps require

Don't just transcribe speech, understand it

Reliable performance across the globe

A single platform for all things Voice AI

Feature-rich AI models

Built on Voice AI infrastructure that scales with you

Predictable usage-based pricing

Complete Voice AI
Platform

Proven reliability and security

The most loved AI apps are built on AssemblyAI

Frequently Asked Questions

Unlock the value of voice data

SpeechUnderstanding