Speaker Identification
Supported languages
en
en_au
en_uk
en_us
es
fr
de
it
pt
nl
hi
ja
zh
fi
ko
pl
ru
tr
uk
vi
af
sq
am
ar
hy
as
az
ba
eu
be
bn
bs
br
bg
ca
hr
cs
da
et
fo
gl
ka
el
gu
ht
ha
haw
he
hu
is
id
jw
kn
kk
lo
la
lv
ln
lt
lb
mk
mg
ms
ml
mt
mi
mr
mn
ne
no
nn
oc
pa
ps
fa
ro
sa
sr
sn
sd
si
sk
sl
so
su
sw
sv
tl
tg
ta
tt
te
tk
ur
uz
cy
yi
yo
Supported models
slam-1
universal
Supported regions
US only
Overview
Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide. Speaker identities are inferred based on the conversation content.
Example transformation:
Before:
After:
Speaker Identification requires that a file be transcribed with Speaker Diarization enabled. See this section of our documentation to learn more about the Speaker Diarization feature.
To reliably identify speakers, your audio should contain clear, distinguishable voices and sufficient spoken audio from each speaker. The accuracy of Speaker Diarization depends on the quality of the audio and the distinctiveness of each speaker’s voice, which will have a downstream effect on the quality of Speaker Identification.
How to use Speaker Identification
There are two ways to use Speaker Identification:
- Transcribe and identify in one request - Best when you’re starting a new transcription and want speaker identification included automatically
- Transcribe and identify in separate requests - Best when you already have a completed transcript or for more complex workflows where you might want to perform other tasks between the transcription and speaker identification process
Method 1: Transcribe and identify in one request
This method is ideal when you’re starting fresh and want both transcription and speaker identification in a single workflow.
Python
JavaScript
Method 2: Transcribe and identify in separate requests
This method is useful when you already have a completed transcript or for more complex workflows where you need to separate transcription from speaker identification.
Python
JavaScript
Output format details
Here is how the structure of the utterances in the utterances
key differs when Speaker Diarization is used versus when Speaker Identification is used:
Before (Speaker Diarization only):
After (with Speaker Identification):
Advanced usage
Identifying speakers by role
Instead of identifying speakers by name as shown in the examples above, you can also identify speakers by role.
This can be useful in customer service calls, AI interactions, or any scenario where you may not know the specific names of the speakers but still want to identify them by something more than a generic identifier like A, B, or C.
To identify speakers by role, use the speaker_type
parameter with a value of “role”:
Example
Common role combinations
["Agent", "Customer"]
- Customer service calls["AI Assistant", "User"]
- AI chatbot interactions["Support", "Customer"]
- Technical support calls["Interviewer", "Interviewee"]
- Interview recordings["Host", "Guest"]
- Podcast or show recordings["Moderator", "Panelist"]
- Panel discussions
API reference
Request
Method 1: Transcribe and identify in one request
When creating a new transcription, include the speech_understanding
parameter directly in your transcription request:
Method 2: Add identification to existing transcripts
For existing transcripts, retrieve the completed transcript and send it to the Speech Understanding API:
Request parameters
Response
The Speaker Identification API returns a modified version of your transcript with updated speaker labels in the utterances
key.
Response fields
Key differences from standard transcription
All other fields (text
, start
, end
, confidence
, words
) remain unchanged from the original transcript.