Speech Understanding API

Turn raw audio into structured, actionable intelligence. Purpose-built models extract meaning from speech in a single API call.

Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Models

Nine models, One API.

Enable any combination of features with a single request. Mix, match, and scale as your application grows.

Speaker identification

Label speakers by name using audio context. Supports custom profiles and multi-speaker conversations.

Voice Agents AI Notetaker Call Analytics Agent Assist AI Scribe Conversation Intelligence

Sentiment analysis

Detect positive, neutral, or negative sentiment at the sentence level for granular conversation analytics.

Voice Agents AI Notetaker Call Analytics Agent Assist Conversation Intelligence

Key phrases

Automatically extract the most significant concepts and phrases from every transcript.

Media Monitoring AI Notetaker Call Analytics Agent Assist Conversation Intelligence

Summarization

Automatically extract the most significant concepts and phrases from every transcript.

AI Scribe AI Notetaker Call Analytics Conversation Intelligence

Custom formatting

Normalize dates, phone numbers, and email addresses to machine-readable formats automatically.

Agent Assist AI Notetaker Call Analytics Voice Agents Conversation Intelligence

Translation

Transcribe in any of 99+ languages in the same API request.

Voice Agents AI Notetaker Agent Assist AI Scribe Conversation Intelligence

Auto chapters

Break long recordings into timestamped sections with auto-generated summaries for each chapter.

Content Creation AI Notetaker Conversation Intelligence

Entity detection

Identify names, organizations, locations, dates, and other entities across every transcript.

Voice Agents AI Notetaker Call Analytics Agent Assist AI Scribe Conversation Intelligence

Topic detection

Classify audio content against the IAB standard taxonomy — ideal for content moderation, routing, and analytics.

Media Monitoring AI Notetaker Call Analytics Agent Assist AI Scribe Conversation Intelligence

The second pillar of your Voice AI pipeline

Results

Conversion improvement for teams using structured conversation intelligence vs. raw transcripts alone.

Efficiency 90%

Reduction in manual QA review time after automating complaint and sentiment detection workflows.

Speed <60s

From audio file to fully structured intelligence — speaker labels, sentiment, entities, summary — in one request.

Intelligence 9+

Models for transcription, summarization, sentiment, entities, and more — all in one API.

Speaker labels

Know who said what

  • label

    Label voices by name or role

  • lab_profile

    Attribute summaries to real people

  • person

    Flag compliance issues by speaker

  • hourglass_empty

    Cut hours of manual call review

Sentiment + topics + entities

Surface meaning automatically

  • detector

    Detect sentiment at the sentence level

  • chip_extraction

    Extract named entities automatically

  • quick_phrases

    Pull key phrases that matter most

  • content_copy

    Classify content by IAB topic

Summaries + chapters + formatting

Structure every conversation

  • data_table

    Turn audio into structured data

  • screen_record

    Break recordings into timestamped chapters

  • layers

    Condense calls into one-paragraph briefs

  • format_size

    Format dates, numbers, and contacts

Translation + languages

Go global without switching vendors

  • translate

    Translate 99+ languages in one API

  • audio_video_receiver

    Run sentiment and entities on multilingual audio

  • skip_next

    Skip separate translation vendors

  • analytics

    Run analysis on multilingual audio in one pass

Common questions