Speaker Identification
Supported languages
enen_auen_uken_usesfrdeitptnlhijazhfikoplrutrukviafsqamarhyasazbaeubebnbsbrbgcahrcsdaetfoglkaelguhthahawhehuisidjwknkklolalvlnltlbmkmgmsmlmtmimrmnnenonnocpapsfarosasrsnsdsiskslsosuswsvtltgtatttetkuruzcyyiyoSupported models
slam-1universalSupported regions
US only
Overview
Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide. Speaker identities are inferred based on the conversation content.
Example transformation:
Before:
After:
Speaker Identification requires that a file be transcribed with Speaker Diarization enabled. See this section of our documentation to learn more about the Speaker Diarization feature.
To reliably identify speakers, your audio should contain clear, distinguishable voices and sufficient spoken audio from each speaker. The accuracy of Speaker Diarization depends on the quality of the audio and the distinctiveness of each speaker’s voice, which will have a downstream effect on the quality of Speaker Identification.
How to use Speaker Identification
There are two ways to use Speaker Identification:
- Transcribe and identify in one request - Best when you’re starting a new transcription and want speaker identification included automatically
- Transcribe and identify in separate requests - Best when you already have a completed transcript or for more complex workflows where you might want to perform other tasks between the transcription and speaker identification process
Method 1: Transcribe and identify in one request
This method is ideal when you’re starting fresh and want both transcription and speaker identification in a single workflow.
Python
JavaScript
Method 2: Transcribe and identify in separate requests
This method is useful when you already have a completed transcript or for more complex workflows where you need to separate transcription from speaker identification.
Python
JavaScript
Output format details
Here is how the structure of the utterances in the utterances key differs when Speaker Diarization is used versus when Speaker Identification is used:
Before (Speaker Diarization only):
After (with Speaker Identification):
Advanced usage
Identifying speakers by role
Instead of identifying speakers by name as shown in the examples above, you can also identify speakers by role.
This can be useful in customer service calls, AI interactions, or any scenario where you may not know the specific names of the speakers but still want to identify them by something more than a generic identifier like A, B, or C.
To identify speakers by role, use the speaker_type parameter with a value of “role”:
Example
Common role combinations
["Agent", "Customer"]- Customer service calls["AI Assistant", "User"]- AI chatbot interactions["Support", "Customer"]- Technical support calls["Interviewer", "Interviewee"]- Interview recordings["Host", "Guest"]- Podcast or show recordings["Moderator", "Panelist"]- Panel discussions
API reference
Request
Method 1: Transcribe and identify in one request
When creating a new transcription, include the speech_understanding parameter directly in your transcription request:
Method 2: Add identification to existing transcripts
For existing transcripts, retrieve the completed transcript and send it to the Speech Understanding API:
Request parameters
Response
The Speaker Identification API returns a modified version of your transcript with updated speaker labels in the utterances key.
Response fields
Key differences from standard transcription
All other fields (text, start, end, confidence, words) remain unchanged from the original transcript.