Using Speaker Identification on an existing transcript
Overview
If you already have a completed transcript, you can add Speaker Identification in a separate request to the Speech Understanding API. This is especially useful when you want to re-identify speakers with different parameters, or when your workflow separates transcription from post-processing.
Speaker Identification requires Speaker Diarization. Your original transcription request must have set speaker_labels: true.
To transcribe and identify speakers in a single request, see the main Speaker Identification page.
Choosing how to identify speakers
You can identify speakers by name or by role:
- Know the speakers’ names? Use
speaker_type: "name"with the names inknown_valuesorspeakers. Click here to learn more. - Know their roles but not names? Use
speaker_type: "role"with roles like"Interviewer"or"Agent"inknown_valuesorspeakers. Click here to learn more. - Need better accuracy? Use
speakerswithdescriptionfields that provide context about what each speaker typically discusses. Click here to learn more.
How to use Speaker Identification on an existing transcript
First, transcribe your audio with speaker_labels: true. Once the transcription is complete, send the transcript_id along with your speaker identification configuration to the Speech Understanding API.
Identify by name
To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.
Python
JavaScript
Identify by role
To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.
Python
JavaScript
Common role combinations
["Agent", "Customer"]- Customer service calls["AI Assistant", "User"]- AI chatbot interactions["Support", "Customer"]- Technical support calls["Interviewer", "Interviewee"]- Interview recordings["Host", "Guest"]- Podcast or show recordings["Moderator", "Panelist"]- Panel discussions
Adding speaker metadata
For more accurate identification, use the speakers parameter instead of known_values to provide descriptions and metadata. The examples below show the understanding_body payload sent to the Speech Understanding API. For setup, transcription, and polling code, see the full examples above.
Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.
At its simplest, you can provide a description alongside each speaker’s name or role:
For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:
You can use the same custom properties with role-based identification by replacing name with role in each speaker object.
API reference
Request
Retrieve the completed transcript and send it to the Speech Understanding API:
Request parameters
For the full list of request parameters, see the Speaker Identification API reference.
Response
For the response format and fields, see the Speaker Identification response reference.