Replace generic “Speaker A” and “Speaker B” labels with real names or roles, no voice enrollment needed. Speaker Identification uses conversation content to infer who’s speaking and applies the identifiers you provide.Example transformation:Before:
Speaker A: Good morning, and welcome to the show.Speaker B: Thanks for having me.Speaker A: Let's dive into today's topic...
After (by name):
Michel Martin: Good morning, and welcome to the show.Peter DeCarlo: Thanks for having me.Michel Martin: Let's dive into today's topic...
After (by role):
Interviewer: Good morning, and welcome to the show.Interviewee: Thanks for having me.Interviewer: Let's dive into today's topic...
Speaker Identification requires Speaker Diarization. You must set speaker_labels: true in your transcription request.
To reliably identify speakers, your audio should contain clear, distinguishable voices and sufficient spoken audio from each speaker. The accuracy of Speaker Diarization depends on the quality of the audio and the distinctiveness of each speaker’s voice, which will have a downstream effect on the quality of Speaker Identification.
Know the speakers’ names? Use speaker_type: "name" with the names in known_values or speakers. Click here to learn more.
Know their roles but not names? Use speaker_type: "role" with roles like "Interviewer" or "Agent" in known_values or speakers. Click here to learn more.
Need better accuracy? Use speakers with description fields that provide context about what each speaker typically discusses. Click here to learn more.
To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.
Python
JavaScript
Python SDK
JavaScript SDK
import requestsimport timebase_url = "https://api.assemblyai.com"headers = { "authorization": "<YOUR_API_KEY>"}# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileupload_url = "https://assembly.ai/wildfires.mp3"# Configure transcript with speaker identificationdata = { "audio_url": upload_url, "speech_models": ["universal-3-pro", "universal-2"], "language_detection": True, "speaker_labels": True, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "name", "known_values": ["Michel Martin", "Peter DeCarlo"] # Change these values to match the names of the speakers in your file } } }}# Submit the transcription requestresponse = requests.post(base_url + "/v2/transcript", headers=headers, json=data)transcript_id = response.json()["id"]polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"# Poll for transcription resultswhile True: transcript = requests.get(polling_endpoint, headers=headers).json() if transcript["status"] == "completed": break elif transcript["status"] == "error": raise RuntimeError(f"Transcription failed: {transcript['error']}") else: time.sleep(3)# Access the results and print utterances to the terminalfor utterance in transcript["utterances"]: print(f"{utterance['speaker']}: {utterance['text']}")
const baseUrl = "https://api.assemblyai.com";const headers = { "authorization": "<YOUR_API_KEY>", "content-type": "application/json"};// Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileconst uploadUrl = "https://assembly.ai/wildfires.mp3";// Configure transcript with speaker identificationconst data = { audio_url: uploadUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true, speech_understanding: { request: { speaker_identification: { speaker_type: "name", known_values: ["Michel Martin", "Peter DeCarlo"] // Change these values to match the names of the speakers in your file } } }};async function main() { // Submit the transcription request const response = await fetch(`${baseUrl}/v2/transcript`, { method: "POST", headers: headers, body: JSON.stringify(data) }); const { id: transcriptId } = await response.json(); const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`; // Poll for transcription results while (true) { const pollingResponse = await fetch(pollingEndpoint, { headers }); const transcript = await pollingResponse.json(); if (transcript.status === "completed") { // Access the results and print utterances to the console for (const utterance of transcript.utterances) { console.log(`${utterance.speaker}: ${utterance.text}`); } break; } else if (transcript.status === "error") { throw new Error(`Transcription failed: ${transcript.error}`); } else { await new Promise(resolve => setTimeout(resolve, 3000)); } }}main().catch(console.error);
import assemblyai as aaiaai.settings.api_key = "<YOUR_API_KEY>"# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileaudio_url = "https://assembly.ai/wildfires.mp3"# Configure transcript with speaker identificationconfig = aai.TranscriptionConfig( speech_models=["universal-3-pro", "universal-2"], language_detection=True, speaker_labels=True, speech_understanding=aai.SpeechUnderstandingRequest( request=aai.SpeechUnderstandingFeatureRequests( speaker_identification=aai.SpeakerIdentificationRequest( speaker_type="name", known_values=["Michel Martin", "Peter DeCarlo"] # Change these values to match the names of the speakers in your file ) ) ))transcriber = aai.Transcriber()transcript = transcriber.transcribe(audio_url, config)# Access the results and print utterances to the terminalfor utterance in transcript.utterances: print(f"{utterance.speaker}: {utterance.text}")
import { AssemblyAI } from "assemblyai";const client = new AssemblyAI({ apiKey: "<YOUR_API_KEY>"});// Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileconst audioUrl = "https://assembly.ai/wildfires.mp3";// Configure transcript with speaker identificationconst params = { audio: audioUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true, speech_understanding: { request: { speaker_identification: { speaker_type: "name", known_values: ["Michel Martin", "Peter DeCarlo"] // Change these values to match the names of the speakers in your file } } }};const transcript = await client.transcripts.transcribe(params);// Access the results and print utterances to the consolefor (const utterance of transcript.utterances) { console.log(`${utterance.speaker}: ${utterance.text}`);}
To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.
Python
JavaScript
Python SDK
JavaScript SDK
import requestsimport timebase_url = "https://api.assemblyai.com"headers = { "authorization": "<YOUR_API_KEY>"}# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileupload_url = "https://assembly.ai/wildfires.mp3"# Configure transcript with role-based speaker identificationdata = { "audio_url": upload_url, "speech_models": ["universal-3-pro", "universal-2"], "language_detection": True, "speaker_labels": True, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "role", "known_values": ["Interviewer", "Interviewee"] # Change these values to match the roles of the speakers in your file } } }}# Submit the transcription requestresponse = requests.post(base_url + "/v2/transcript", headers=headers, json=data)transcript_id = response.json()["id"]polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"# Poll for transcription resultswhile True: transcript = requests.get(polling_endpoint, headers=headers).json() if transcript["status"] == "completed": break elif transcript["status"] == "error": raise RuntimeError(f"Transcription failed: {transcript['error']}") else: time.sleep(3)# Access the results and print utterances to the terminalfor utterance in transcript["utterances"]: print(f"{utterance['speaker']}: {utterance['text']}")
const baseUrl = "https://api.assemblyai.com";const headers = { "authorization": "<YOUR_API_KEY>", "content-type": "application/json"};// Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileconst uploadUrl = "https://assembly.ai/wildfires.mp3";// Configure transcript with role-based speaker identificationconst data = { audio_url: uploadUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true, speech_understanding: { request: { speaker_identification: { speaker_type: "role", known_values: ["Interviewer", "Interviewee"] // Change these values to match the roles of the speakers in your file } } }};async function main() { // Submit the transcription request const response = await fetch(`${baseUrl}/v2/transcript`, { method: "POST", headers: headers, body: JSON.stringify(data) }); const { id: transcriptId } = await response.json(); const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`; // Poll for transcription results while (true) { const pollingResponse = await fetch(pollingEndpoint, { headers }); const transcript = await pollingResponse.json(); if (transcript.status === "completed") { // Access the results and print utterances to the console for (const utterance of transcript.utterances) { console.log(`${utterance.speaker}: ${utterance.text}`); } break; } else if (transcript.status === "error") { throw new Error(`Transcription failed: ${transcript.error}`); } else { await new Promise(resolve => setTimeout(resolve, 3000)); } }}main().catch(console.error);
import assemblyai as aaiaai.settings.api_key = "<YOUR_API_KEY>"audio_url = "https://assembly.ai/wildfires.mp3"config = aai.TranscriptionConfig( speech_models=["universal-3-pro", "universal-2"], language_detection=True, speaker_labels=True, speech_understanding=aai.SpeechUnderstandingRequest( request=aai.SpeechUnderstandingFeatureRequests( speaker_identification=aai.SpeakerIdentificationRequest( speaker_type="role", known_values=["Interviewer", "Interviewee"] # Change these values to match the roles of the speakers in your file ) ) ))transcriber = aai.Transcriber()transcript = transcriber.transcribe(audio_url, config)for utterance in transcript.utterances: print(f"{utterance.speaker}: {utterance.text}")
import { AssemblyAI } from "assemblyai";const client = new AssemblyAI({ apiKey: "<YOUR_API_KEY>"});const audioUrl = "https://assembly.ai/wildfires.mp3";const params = { audio: audioUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true, speech_understanding: { request: { speaker_identification: { speaker_type: "role", known_values: ["Interviewer", "Interviewee"] // Change these values to match the roles of the speakers in your file } } }};const transcript = await client.transcripts.transcribe(params);for (const utterance of transcript.utterances) { console.log(`${utterance.speaker}: ${utterance.text}`);}
If you already have a completed transcript, you can add Speaker Identification in a separate request to the Speech Understanding API. This is useful when you want to re-identify speakers with different parameters, or when your workflow separates transcription from post-processing.First, transcribe your audio with speaker_labels: true. Once transcription is complete, send the transcript_id along with your speaker identification configuration to the Speech Understanding API.
Python
JavaScript
import requestsimport timebase_url = "https://api.assemblyai.com"headers = { "authorization": "<YOUR_API_KEY>"}# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileupload_url = "https://assembly.ai/wildfires.mp3"data = { "audio_url": upload_url, "speech_models": ["universal-3-pro", "universal-2"], "language_detection": True, "speaker_labels": True}# Transcribe fileresponse = requests.post(base_url + "/v2/transcript", headers=headers, json=data)transcript_id = response.json()["id"]polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"# Poll for transcription resultswhile True: transcript = requests.get(polling_endpoint, headers=headers).json() if transcript["status"] == "completed": break elif transcript["status"] == "error": raise RuntimeError(f"Transcription failed: {transcript['error']}") else: time.sleep(3)# Enable speaker identificationunderstanding_body = { "transcript_id": transcript_id, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "name", "known_values": ["Michel Martin", "Peter DeCarlo"] # Change these values to match the names of the speakers in your file } } }}# Send the modified transcript to the Speech Understanding APIresult = requests.post( "https://llm-gateway.assemblyai.com/v1/understanding", headers=headers, json=understanding_body).json()# Access the results and print utterances to the terminalfor utterance in result["utterances"]: print(f"{utterance['speaker']}: {utterance['text']}")
const baseUrl = "https://api.assemblyai.com";const apiKey = "<YOUR_API_KEY>";const headers = { "authorization": apiKey, "content-type": "application/json"};// Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileconst uploadUrl = "https://assembly.ai/wildfires.mp3";async function transcribeAndIdentifySpeakers() { // Transcribe file const transcriptResponse = await fetch(`${baseUrl}/v2/transcript`, { method: 'POST', headers: headers, body: JSON.stringify({ audio_url: uploadUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true }) }); const { id: transcriptId } = await transcriptResponse.json(); const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`; // Poll for transcription results while (true) { const pollingResponse = await fetch(pollingEndpoint, { headers }); const transcript = await pollingResponse.json(); if (transcript.status === "completed") { break; } else if (transcript.status === "error") { throw new Error(`Transcription failed: ${transcript.error}`); } else { await new Promise(resolve => setTimeout(resolve, 3000)); } } // Enable speaker identification const understandingBody = { transcript_id: transcriptId, speech_understanding: { request: { speaker_identification: { speaker_type: "name", known_values: ["Michel Martin", "Peter DeCarlo"] // Change these values to match the names of the speakers in your file } } } }; // Send the modified transcript to the Speech Understanding API const understandingResponse = await fetch( "https://llm-gateway.assemblyai.com/v1/understanding", { method: 'POST', headers: headers, body: JSON.stringify(understandingBody) } ); const result = await understandingResponse.json(); // Access the results and print utterances to the terminal for (const utterance of result.utterances) { console.log(`${utterance.speaker}: ${utterance.text}`); }}transcribeAndIdentifySpeakers();
The example above identifies speakers by name. To identify by role, keep the same two-step flow and set speaker_type: "role" with role labels in known_values (see Identify by role). The speakers metadata approach works with this flow too.
For more accurate speaker identification, you can use the speakers parameter instead of known_values. The speakers parameter lets you provide additional metadata about each speaker to help the model identify speakers based on conversational context.This is particularly useful when:
Speakers have similar voices but distinct roles or topics
You want to provide contextual clues about what each speaker typically discusses
You need more precise identification in complex multi-speaker scenarios
Each speaker object must include either a name or role (depending on speaker_type). Beyond that, you can add any additional properties you want. The name and role fields are reserved as strings, but all other properties are flexible and can be any structure.
Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.
At its simplest, you can provide a description alongside each speaker’s name or role:
data = { "audio_url": upload_url, "speaker_labels": True, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "role", "speakers": [ { "role": "interviewer", "description": "Hosts the program and interviews the guests" }, { "role": "guest", "description": "Answers questions from the interview" } ] } } }}
For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:
data = { "audio_url": upload_url, "speaker_labels": True, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "name", "speakers": [ { "name": "Michel Martin", "description": "Hosts the program and interviews the guests", "company": "NPR", "title": "Host Morning Edition" }, { "name": "Peter DeCarlo", "description": "Answers questions from the interview", "company": "Johns Hopkins University", "title": "Professor and Vice Chair of Environmental Health and Engineering" } ] } } }}
You can use the same custom properties with role-based identification by replacing name with role in each speaker object.
Transcribe with speaker_labels: true, then send the completed transcript_id to the Speech Understanding API:
# Step 1: Submit transcription jobcurl -X POST "https://api.assemblyai.com/v2/transcript" \ -H "authorization: <YOUR_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "audio_url": "https://assembly.ai/wildfires.mp3", "speaker_labels": true }'# Save the transcript_id from the response above, then use it in the following commands# Step 2: Poll for transcription status (repeat until status is "completed")curl -X GET "https://api.assemblyai.com/v2/transcript/{transcript_id}" \ -H "authorization: <YOUR_API_KEY>"# Step 3: Once transcription is completed, enable speaker identificationcurl -X POST "https://llm-gateway.assemblyai.com/v1/understanding" \ -H "authorization: <YOUR_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "transcript_id": "{transcript_id}", "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "name", "known_values": ["Michel Martin", "Peter DeCarlo"] } } } }'
The following parameters are nested under speech_understanding.request.speaker_identification:
Key
Type
Required?
Description
speaker_type
string
Yes
The type of speakers being identified, values accepted are “name” for actual names or “role” for roles/titles.
known_values
array
Conditional
List of speaker names or roles. Required when speaker_type is set to “role” and speakers is not provided. Optional when speaker_type is set to “name”. Each value must be 35 characters or less. Use known_values or speakers, not both.
speakers
array
Conditional
An array of speaker objects with metadata. Use as an alternative to known_values when you want to provide additional context about each speaker. You can include any additional custom properties beyond name/role and description. Use speakers or known_values, not both.
speakers[].role
string
Conditional
The role of the speaker. Required when speaker_type is “role”.
speakers[].name
string
Conditional
The name of the speaker. Required when speaker_type is “name”.
speakers[].description
string
No
A description of the speaker to help the model identify them based on conversational context.
speakers[].<custom>
any
No
Any additional custom properties (e.g., company, title, department) to provide more context about the speaker. The name and role fields are reserved as strings, but all other properties are flexible.
The status of the speaker identification request (e.g., “success”).
utterances
array
A turn-by-turn temporal sequence of the transcript, where the i-th element is an object containing information about the i-th utterance in the audio file.
utterances[i].confidence
number
The confidence score for the transcript of this utterance.
utterances[i].end
number
The ending time, in milliseconds, of the utterance in the audio file.
utterances[i].speaker
string
The identified speaker name or role for this utterance.
utterances[i].start
number
The starting time, in milliseconds, of the utterance in the audio file.
utterances[i].text
string
The transcript for this utterance.
utterances[i].words
array
A sequential array for the words in the transcript, where the j-th element is an object containing information about the j-th word in the utterance.
utterances[i].words[j].text
string
The text of the j-th word in the i-th utterance.
utterances[i].words[j].start
number
The starting time for when the j-th word is spoken in the i-th utterance, in milliseconds.
utterances[i].words[j].end
number
The ending time for when the j-th word is spoken in the i-th utterance, in milliseconds.
utterances[i].words[j].confidence
number
The confidence score for the transcript of the j-th word in the i-th utterance.
utterances[i].words[j].speaker
string
The identified speaker name or role who uttered the j-th word in the i-th utterance.
With Speaker Identification, the speaker field in utterances and words contains the identified name or role (e.g., "Michel Martin" or "Agent") instead of generic labels like "A", "B", "C". All other fields (text, start, end, confidence, words) remain unchanged from the standard transcription response.