enen_auen_uken_usesfrdeitptnlhijazhfikoplrutrukviafsqamarhyasazbaeubebnbsbrbgcahrcsdaetfoglkaelguhthahawhehuisidjwknkklolalvlnltlbmkmgmsmlmtmimrmnnenonnocpapsfarosasrsnsdsiskslsosuswsvtltgtatttetkuruzcyyiyouniversal-3-prouniversal-2US & EU
Replace generic “Speaker A” and “Speaker B” labels with real names or roles, no voice enrollment needed. Speaker Identification uses conversation content to infer who’s speaking and applies the identifiers you provide.
Example transformation:
Before:
After (by name):
After (by role):
Speaker Identification requires Speaker Diarization. You must set speaker_labels: true in your transcription request.
To reliably identify speakers, your audio should contain clear, distinguishable voices and sufficient spoken audio from each speaker. The accuracy of Speaker Diarization depends on the quality of the audio and the distinctiveness of each speaker’s voice, which will have a downstream effect on the quality of Speaker Identification.
You can identify speakers by name or by role:
speaker_type: "name" with the names in known_values or speakers. Click here to learn more.speaker_type: "role" with roles like "Interviewer" or "Agent" in known_values or speakers. Click here to learn more.speakers with description fields that provide context about what each speaker typically discusses. Click here to learn more.Include the speech_understanding parameter in your transcription request to identify speakers.
Already have a completed transcript? You can add Speaker Identification to an existing transcript in a separate request.
To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.
To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.
["Agent", "Customer"] - Customer service calls["AI Assistant", "User"] - AI chatbot interactions["Support", "Customer"] - Technical support calls["Interviewer", "Interviewee"] - Interview recordings["Host", "Guest"] - Podcast or show recordings["Moderator", "Panelist"] - Panel discussionsFor more accurate speaker identification, you can use the speakers parameter instead of known_values. The speakers parameter lets you provide additional metadata about each speaker to help the model identify speakers based on conversational context.
This is particularly useful when:
Each speaker object must include either a name or role (depending on speaker_type). Beyond that, you can add any additional properties you want. The name and role fields are reserved as strings, but all other properties are flexible and can be any structure.
Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.
At its simplest, you can provide a description alongside each speaker’s name or role:
For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:
You can use the same custom properties with role-based identification by replacing name with role in each speaker object.
Include the speech_understanding parameter directly in your transcription request (shown here with name-based identification):
The following parameters are nested under speech_understanding.request.speaker_identification:
The Speaker Identification API returns a modified version of your transcript with updated speaker labels in the utterances key.
With Speaker Identification, the speaker field in utterances and words contains the identified name or role (e.g., "Michel Martin" or "Agent") instead of generic labels like "A", "B", "C". All other fields (text, start, end, confidence, words) remain unchanged from the standard transcription response.