Speaker Identification

Supported languages

Global Englishen

Australian Englishen_au

British Englishen_uk

US Englishen_us

Spanishes

Frenchfr

Germande

Italianit

Portuguesept

Dutchnl

Hindihi

Japaneseja

Chinesezh

Finnishfi

Koreanko

Polishpl

Russianru

Turkishtr

Ukrainianuk

Vietnamesevi

Afrikaansaf

Albaniansq

Amharicam

Arabicar

Armenianhy

Assameseas

Azerbaijaniaz

Bashkirba

Basqueeu

Belarusianbe

Bengalibn

Bosnianbs

Bretonbr

Bulgarianbg

Catalanca

Croatianhr

Czechcs

Danishda

Estonianet

Faroesefo

Galiciangl

Georgianka

Greekel

Gujaratigu

Haitianht

Hausaha

Hawaiianhaw

Hebrewhe

Hungarianhu

Icelandicis

Indonesianid

Javanesejw

Kannadakn

Kazakhkk

Laolo

Latinla

Latvianlv

Lingalaln

Lithuanianlt

Luxembourgishlb

Macedonianmk

Malagasymg

Malayms

Malayalamml

Maltesemt

Maorimi

Marathimr

Mongolianmn

Nepaline

Norwegianno

Norwegian Nynorsknn

Occitanoc

Panjabipa

Pashtops

Persianfa

Romanianro

Sanskritsa

Serbiansr

Shonasn

Sindhisd

Sinhalasi

Slovaksk

Sloveniansl

Somaliso

Sundanesesu

Swahilisw

Swedishsv

Tagalogtl

Tajiktg

Tamilta

Tatartt

Telugute

Turkmentk

Urduur

Uzbekuz

Welshcy

Yiddishyi

Yorubayo

Supported models

Universal-3 Prouniversal-3-pro

Universal-2universal-2

Supported regions

US & EU

Overview

Replace generic “Speaker A” and “Speaker B” labels with real names or roles, no voice enrollment needed. Speaker Identification uses conversation content to infer who’s speaking and applies the identifiers you provide.

Example transformation:

Before:

Speaker A: Good morning, and welcome to the show.
Speaker B: Thanks for having me.
Speaker A: Let's dive into today's topic...

After (by name):

Michel Martin: Good morning, and welcome to the show.
Peter DeCarlo: Thanks for having me.
Michel Martin: Let's dive into today's topic...

After (by role):

Interviewer: Good morning, and welcome to the show.
Interviewee: Thanks for having me.
Interviewer: Let's dive into today's topic...

Speaker Identification requires Speaker Diarization. You must set speaker_labels: true in your transcription request.

To reliably identify speakers, your audio should contain clear, distinguishable voices and sufficient spoken audio from each speaker. The accuracy of Speaker Diarization depends on the quality of the audio and the distinctiveness of each speaker’s voice, which will have a downstream effect on the quality of Speaker Identification.

Choosing how to identify speakers

You can identify speakers by name or by role:

Know the speakers’ names? Use speaker_type: "name" with the names in known_values or speakers. Click here to learn more.
Know their roles but not names? Use speaker_type: "role" with roles like "Interviewer" or "Agent" in known_values or speakers. Click here to learn more.
Need better accuracy? Use speakers with description fields that provide context about what each speaker typically discusses. Click here to learn more.

How to use Speaker Identification

Include the speech_understanding parameter in your transcription request to identify speakers.

Already have a completed transcript? You can add Speaker Identification to an existing transcript in a separate request.

Identify by name

To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.

Python

JavaScript

Python SDK

JavaScript SDK

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7     "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 
12 upload_url = "https://assembly.ai/wildfires.mp3"
13 
14 # Configure transcript with speaker identification
15 
16 data = {
17     "audio_url": upload_url,
18     "speech_models": ["universal-3-pro", "universal-2"],
19     "language_detection": True,
20     "speaker_labels": True,
21     "speech_understanding": {
22         "request": {
23             "speaker_identification": {
24                 "speaker_type": "name",
25                 "known_values": ["Michel Martin", "Peter DeCarlo"]  # Change these values to match the names of the speakers in your file
26             }
27         }
28     }
29 }
30 
31 # Submit the transcription request
32 
33 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
34 transcript_id = response.json()["id"]
35 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
36 
37 # Poll for transcription results
38 
39 while True:
40     transcript = requests.get(polling_endpoint, headers=headers).json()
41 
42     if transcript["status"] == "completed":
43         break
44 
45     elif transcript["status"] == "error":
46         raise RuntimeError(f"Transcription failed: {transcript['error']}")
47 
48     else:
49         time.sleep(3)
50 
51 # Access the results and print utterances to the terminal
52 
53 for utterance in transcript["utterances"]:
54     print(f"{utterance['speaker']}: {utterance['text']}")

Identify by role

To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.

Python

JavaScript

Python SDK

JavaScript SDK

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7     "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 
12 upload_url = "https://assembly.ai/wildfires.mp3"
13 
14 # Configure transcript with role-based speaker identification
15 
16 data = {
17     "audio_url": upload_url,
18     "speech_models": ["universal-3-pro", "universal-2"],
19     "language_detection": True,
20     "speaker_labels": True,
21     "speech_understanding": {
22         "request": {
23             "speaker_identification": {
24                 "speaker_type": "role",
25                 "known_values": ["Interviewer", "Interviewee"]  # Change these values to match the roles of the speakers in your file
26             }
27         }
28     }
29 }
30 
31 # Submit the transcription request
32 
33 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
34 transcript_id = response.json()["id"]
35 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
36 
37 # Poll for transcription results
38 
39 while True:
40     transcript = requests.get(polling_endpoint, headers=headers).json()
41 
42     if transcript["status"] == "completed":
43         break
44 
45     elif transcript["status"] == "error":
46         raise RuntimeError(f"Transcription failed: {transcript['error']}")
47 
48     else:
49         time.sleep(3)
50 
51 # Access the results and print utterances to the terminal
52 
53 for utterance in transcript["utterances"]:
54     print(f"{utterance['speaker']}: {utterance['text']}")

Common role combinations

["Agent", "Customer"] - Customer service calls
["AI Assistant", "User"] - AI chatbot interactions
["Support", "Customer"] - Technical support calls
["Interviewer", "Interviewee"] - Interview recordings
["Host", "Guest"] - Podcast or show recordings
["Moderator", "Panelist"] - Panel discussions

Adding speaker metadata

For more accurate speaker identification, you can use the speakers parameter instead of known_values. The speakers parameter lets you provide additional metadata about each speaker to help the model identify speakers based on conversational context.

This is particularly useful when:

Speakers have similar voices but distinct roles or topics
You want to provide contextual clues about what each speaker typically discusses
You need more precise identification in complex multi-speaker scenarios

Each speaker object must include either a name or role (depending on speaker_type). Beyond that, you can add any additional properties you want. The name and role fields are reserved as strings, but all other properties are flexible and can be any structure.

Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.

At its simplest, you can provide a description alongside each speaker’s name or role:

1 data = {
2   "audio_url": upload_url,
3   "speaker_labels": True,
4   "speech_understanding": {
5     "request": {
6       "speaker_identification": {
7         "speaker_type": "role",
8         "speakers": [
9           {
10             "role": "interviewer",
11             "description": "Hosts the program and interviews the guests"
12           },
13           {
14             "role": "guest",
15             "description": "Answers questions from the interview"
16           }
17         ]
18       }
19     }
20   }
21 }

For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:

1 data = {
2   "audio_url": upload_url,
3   "speaker_labels": True,
4   "speech_understanding": {
5     "request": {
6       "speaker_identification": {
7         "speaker_type": "name",
8         "speakers": [
9           {
10             "name": "Michel Martin",
11             "description": "Hosts the program and interviews the guests",
12             "company": "NPR",
13             "title": "Host Morning Edition"
14           },
15           {
16             "name": "Peter DeCarlo",
17             "description": "Answers questions from the interview",
18             "company": "Johns Hopkins University",
19             "title": "Professor and Vice Chair of Environmental Health and Engineering"
20           }
21         ]
22       }
23     }
24   }
25 }

You can use the same custom properties with role-based identification by replacing name with role in each speaker object.

API reference

Request

Include the speech_understanding parameter directly in your transcription request (shown here with name-based identification):

$ curl -X POST \
>   "https://api.assemblyai.com/v2/transcript" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "audio_url": "https://assembly.ai/wildfires.mp3",
>     "speaker_labels": true,
>     "speech_understanding": {
>       "request": {
>         "speaker_identification": {
>           "speaker_type": "name",
>           "known_values": ["Michel Martin", "Peter DeCarlo"]
>         }
>       }
>     }
>   }'

Request parameters

The following parameters are nested under speech_understanding.request.speaker_identification:

Key	Type	Required?	Description
`speaker_type`	string	Yes	The type of speakers being identified, values accepted are “name” for actual names or “role” for roles/titles.
`known_values`	array	Conditional	List of speaker names or roles. Required when `speaker_type` is set to “role” and `speakers` is not provided. Optional when `speaker_type` is set to “name”. Each value must be 35 characters or less. Use `known_values` or `speakers`, not both.
`speakers`	array	Conditional	An array of speaker objects with metadata. Use as an alternative to `known_values` when you want to provide additional context about each speaker. You can include any additional custom properties beyond `name`/`role` and `description`. Use `speakers` or `known_values`, not both.
`speakers[].role`	string	Conditional	The role of the speaker. Required when `speaker_type` is “role”.
`speakers[].name`	string	Conditional	The name of the speaker. Required when `speaker_type` is “name”.
`speakers[].description`	string	No	A description of the speaker to help the model identify them based on conversational context.
`speakers[].<custom>`	any	No	Any additional custom properties (e.g., `company`, `title`, `department`) to provide more context about the speaker. The `name` and `role` fields are reserved as strings, but all other properties are flexible.

Response

The Speaker Identification API returns a modified version of your transcript with updated speaker labels in the utterances key.

1 {
2   "speech_understanding": {
3     "request": {
4       "speaker_identification": {
5         "speaker_type": "name",
6         "known_values": ["Michel Martin", "Peter DeCarlo"]
7       }
8     },
9     "response": {
10       "speaker_identification": {
11         "mapping": {
12           "A": "Michel Martin",
13           "B": "Peter DeCarlo"
14         },
15         "status": "success"
16       }
17     }
18   },
19   "utterances": [
20     {
21       "speaker": "Michel Martin",
22       "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
23       "start": 240,
24       "end": 26560,
25       "confidence": 0.9815734,
26       "words": [
27         {
28           "text": "Smoke",
29           "start": 240,
30           "end": 640,
31           "confidence": 0.90152997,
32           "speaker": "Michel Martin"
33         }
34         // ... more words
35       ]
36     }
37     // ... more utterances
38   ]
39 }

Response fields

Key	Type	Description
`speech_understanding.response.speaker_identification.mapping`	object	A mapping of the original generic speaker labels (e.g., “A”, “B”) to the identified speaker names or roles.
`speech_understanding.response.speaker_identification.status`	string	The status of the speaker identification request (e.g., “success”).
`utterances`	array	A turn-by-turn temporal sequence of the transcript, where the i-th element is an object containing information about the i-th utterance in the audio file.
`utterances[i].confidence`	number	The confidence score for the transcript of this utterance.
`utterances[i].end`	number	The ending time, in milliseconds, of the utterance in the audio file.
`utterances[i].speaker`	string	The identified speaker name or role for this utterance.
`utterances[i].start`	number	The starting time, in milliseconds, of the utterance in the audio file.
`utterances[i].text`	string	The transcript for this utterance.
`utterances[i].words`	array	A sequential array for the words in the transcript, where the j-th element is an object containing information about the j-th word in the utterance.
`utterances[i].words[j].text`	string	The text of the j-th word in the i-th utterance.
`utterances[i].words[j].start`	number	The starting time for when the j-th word is spoken in the i-th utterance, in milliseconds.
`utterances[i].words[j].end`	number	The ending time for when the j-th word is spoken in the i-th utterance, in milliseconds.
`utterances[i].words[j].confidence`	number	The confidence score for the transcript of the j-th word in the i-th utterance.
`utterances[i].words[j].speaker`	string	The identified speaker name or role who uttered the j-th word in the i-th utterance.

With Speaker Identification, the speaker field in utterances and words contains the identified name or role (e.g., "Michel Martin" or "Agent") instead of generic labels like "A", "B", "C". All other fields (text, start, end, confidence, words) remain unchanged from the standard transcription response.

Supported languages

Global Englishen

Australian Englishen_au

British Englishen_uk

US Englishen_us

Spanishes

Frenchfr

Germande

Italianit

Portuguesept

Dutchnl

Hindihi

Japaneseja

Chinesezh

Finnishfi

Koreanko

Polishpl

Russianru

Turkishtr

Ukrainianuk

Vietnamesevi

Afrikaansaf

Albaniansq

Amharicam

Arabicar

Armenianhy

Assameseas

Azerbaijaniaz

Bashkirba

Basqueeu

Belarusianbe

Bengalibn

Bosnianbs

Bretonbr

Bulgarianbg

Catalanca

Croatianhr

Czechcs

Danishda

Estonianet

Faroesefo

Galiciangl

Georgianka

Greekel

Gujaratigu

Haitianht

Hausaha

Hawaiianhaw

Hebrewhe

Hungarianhu

Icelandicis

Indonesianid

Javanesejw

Kannadakn

Kazakhkk

Laolo

Latinla

Latvianlv

Lingalaln

Lithuanianlt

Luxembourgishlb

Macedonianmk

Malagasymg

Malayms

Malayalamml

Maltesemt

Maorimi

Marathimr

Mongolianmn

Nepaline

Norwegianno

Norwegian Nynorsknn

Occitanoc

Panjabipa

Pashtops

Persianfa

Romanianro

Sanskritsa

Serbiansr

Shonasn

Sindhisd

Sinhalasi

Slovaksk

Sloveniansl

Somaliso

Sundanesesu

Swahilisw

Swedishsv

Tagalogtl

Tajiktg

Tamilta

Tatartt

Telugute

Turkmentk

Urduur

Uzbekuz

Welshcy

Yiddishyi

Yorubayo

Supported models

Universal-3 Prouniversal-3-pro

Universal-2universal-2

Supported regions

US & EU

Overview

Example transformation:

Before:

Speaker A: Good morning, and welcome to the show.
Speaker B: Thanks for having me.
Speaker A: Let's dive into today's topic...

After (by name):

Michel Martin: Good morning, and welcome to the show.
Peter DeCarlo: Thanks for having me.
Michel Martin: Let's dive into today's topic...

After (by role):

Interviewer: Good morning, and welcome to the show.
Interviewee: Thanks for having me.
Interviewer: Let's dive into today's topic...

Speaker Identification requires Speaker Diarization. You must set speaker_labels: true in your transcription request.

Choosing how to identify speakers

You can identify speakers by name or by role:

Know the speakers’ names? Use speaker_type: "name" with the names in known_values or speakers. Click here to learn more.
Know their roles but not names? Use speaker_type: "role" with roles like "Interviewer" or "Agent" in known_values or speakers. Click here to learn more.
Need better accuracy? Use speakers with description fields that provide context about what each speaker typically discusses. Click here to learn more.

How to use Speaker Identification

Include the speech_understanding parameter in your transcription request to identify speakers.

Already have a completed transcript? You can add Speaker Identification to an existing transcript in a separate request.

Identify by name

To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.

Python

JavaScript

Python SDK

JavaScript SDK

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7     "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 
12 upload_url = "https://assembly.ai/wildfires.mp3"
13 
14 # Configure transcript with speaker identification
15 
16 data = {
17     "audio_url": upload_url,
18     "speech_models": ["universal-3-pro", "universal-2"],
19     "language_detection": True,
20     "speaker_labels": True,
21     "speech_understanding": {
22         "request": {
23             "speaker_identification": {
24                 "speaker_type": "name",
25                 "known_values": ["Michel Martin", "Peter DeCarlo"]  # Change these values to match the names of the speakers in your file
26             }
27         }
28     }
29 }
30 
31 # Submit the transcription request
32 
33 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
34 transcript_id = response.json()["id"]
35 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
36 
37 # Poll for transcription results
38 
39 while True:
40     transcript = requests.get(polling_endpoint, headers=headers).json()
41 
42     if transcript["status"] == "completed":
43         break
44 
45     elif transcript["status"] == "error":
46         raise RuntimeError(f"Transcription failed: {transcript['error']}")
47 
48     else:
49         time.sleep(3)
50 
51 # Access the results and print utterances to the terminal
52 
53 for utterance in transcript["utterances"]:
54     print(f"{utterance['speaker']}: {utterance['text']}")

Identify by role

Python

JavaScript

Python SDK

JavaScript SDK

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7     "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 
12 upload_url = "https://assembly.ai/wildfires.mp3"
13 
14 # Configure transcript with role-based speaker identification
15 
16 data = {
17     "audio_url": upload_url,
18     "speech_models": ["universal-3-pro", "universal-2"],
19     "language_detection": True,
20     "speaker_labels": True,
21     "speech_understanding": {
22         "request": {
23             "speaker_identification": {
24                 "speaker_type": "role",
25                 "known_values": ["Interviewer", "Interviewee"]  # Change these values to match the roles of the speakers in your file
26             }
27         }
28     }
29 }
30 
31 # Submit the transcription request
32 
33 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
34 transcript_id = response.json()["id"]
35 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
36 
37 # Poll for transcription results
38 
39 while True:
40     transcript = requests.get(polling_endpoint, headers=headers).json()
41 
42     if transcript["status"] == "completed":
43         break
44 
45     elif transcript["status"] == "error":
46         raise RuntimeError(f"Transcription failed: {transcript['error']}")
47 
48     else:
49         time.sleep(3)
50 
51 # Access the results and print utterances to the terminal
52 
53 for utterance in transcript["utterances"]:
54     print(f"{utterance['speaker']}: {utterance['text']}")

Common role combinations

["Agent", "Customer"] - Customer service calls
["AI Assistant", "User"] - AI chatbot interactions
["Support", "Customer"] - Technical support calls
["Interviewer", "Interviewee"] - Interview recordings
["Host", "Guest"] - Podcast or show recordings
["Moderator", "Panelist"] - Panel discussions

Adding speaker metadata

This is particularly useful when:

Speakers have similar voices but distinct roles or topics
You want to provide contextual clues about what each speaker typically discusses
You need more precise identification in complex multi-speaker scenarios

Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.

At its simplest, you can provide a description alongside each speaker’s name or role:

1 data = {
2   "audio_url": upload_url,
3   "speaker_labels": True,
4   "speech_understanding": {
5     "request": {
6       "speaker_identification": {
7         "speaker_type": "role",
8         "speakers": [
9           {
10             "role": "interviewer",
11             "description": "Hosts the program and interviews the guests"
12           },
13           {
14             "role": "guest",
15             "description": "Answers questions from the interview"
16           }
17         ]
18       }
19     }
20   }
21 }

1 data = {
2   "audio_url": upload_url,
3   "speaker_labels": True,
4   "speech_understanding": {
5     "request": {
6       "speaker_identification": {
7         "speaker_type": "name",
8         "speakers": [
9           {
10             "name": "Michel Martin",
11             "description": "Hosts the program and interviews the guests",
12             "company": "NPR",
13             "title": "Host Morning Edition"
14           },
15           {
16             "name": "Peter DeCarlo",
17             "description": "Answers questions from the interview",
18             "company": "Johns Hopkins University",
19             "title": "Professor and Vice Chair of Environmental Health and Engineering"
20           }
21         ]
22       }
23     }
24   }
25 }

You can use the same custom properties with role-based identification by replacing name with role in each speaker object.

API reference

Request

Include the speech_understanding parameter directly in your transcription request (shown here with name-based identification):

$ curl -X POST \
>   "https://api.assemblyai.com/v2/transcript" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "audio_url": "https://assembly.ai/wildfires.mp3",
>     "speaker_labels": true,
>     "speech_understanding": {
>       "request": {
>         "speaker_identification": {
>           "speaker_type": "name",
>           "known_values": ["Michel Martin", "Peter DeCarlo"]
>         }
>       }
>     }
>   }'

Request parameters

The following parameters are nested under speech_understanding.request.speaker_identification:

Key	Type	Required?	Description
`speaker_type`	string	Yes	The type of speakers being identified, values accepted are “name” for actual names or “role” for roles/titles.
`known_values`	array	Conditional	List of speaker names or roles. Required when `speaker_type` is set to “role” and `speakers` is not provided. Optional when `speaker_type` is set to “name”. Each value must be 35 characters or less. Use `known_values` or `speakers`, not both.
`speakers`	array	Conditional	An array of speaker objects with metadata. Use as an alternative to `known_values` when you want to provide additional context about each speaker. You can include any additional custom properties beyond `name`/`role` and `description`. Use `speakers` or `known_values`, not both.
`speakers[].role`	string	Conditional	The role of the speaker. Required when `speaker_type` is “role”.
`speakers[].name`	string	Conditional	The name of the speaker. Required when `speaker_type` is “name”.
`speakers[].description`	string	No	A description of the speaker to help the model identify them based on conversational context.
`speakers[].<custom>`	any	No	Any additional custom properties (e.g., `company`, `title`, `department`) to provide more context about the speaker. The `name` and `role` fields are reserved as strings, but all other properties are flexible.

Response

The Speaker Identification API returns a modified version of your transcript with updated speaker labels in the utterances key.

1 {
2   "speech_understanding": {
3     "request": {
4       "speaker_identification": {
5         "speaker_type": "name",
6         "known_values": ["Michel Martin", "Peter DeCarlo"]
7       }
8     },
9     "response": {
10       "speaker_identification": {
11         "mapping": {
12           "A": "Michel Martin",
13           "B": "Peter DeCarlo"
14         },
15         "status": "success"
16       }
17     }
18   },
19   "utterances": [
20     {
21       "speaker": "Michel Martin",
22       "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
23       "start": 240,
24       "end": 26560,
25       "confidence": 0.9815734,
26       "words": [
27         {
28           "text": "Smoke",
29           "start": 240,
30           "end": 640,
31           "confidence": 0.90152997,
32           "speaker": "Michel Martin"
33         }
34         // ... more words
35       ]
36     }
37     // ... more utterances
38   ]
39 }

Response fields

Key	Type	Description
`speech_understanding.response.speaker_identification.mapping`	object	A mapping of the original generic speaker labels (e.g., “A”, “B”) to the identified speaker names or roles.
`speech_understanding.response.speaker_identification.status`	string	The status of the speaker identification request (e.g., “success”).
`utterances`	array	A turn-by-turn temporal sequence of the transcript, where the i-th element is an object containing information about the i-th utterance in the audio file.
`utterances[i].confidence`	number	The confidence score for the transcript of this utterance.
`utterances[i].end`	number	The ending time, in milliseconds, of the utterance in the audio file.
`utterances[i].speaker`	string	The identified speaker name or role for this utterance.
`utterances[i].start`	number	The starting time, in milliseconds, of the utterance in the audio file.
`utterances[i].text`	string	The transcript for this utterance.
`utterances[i].words`	array	A sequential array for the words in the transcript, where the j-th element is an object containing information about the j-th word in the utterance.
`utterances[i].words[j].text`	string	The text of the j-th word in the i-th utterance.
`utterances[i].words[j].start`	number	The starting time for when the j-th word is spoken in the i-th utterance, in milliseconds.
`utterances[i].words[j].end`	number	The ending time for when the j-th word is spoken in the i-th utterance, in milliseconds.
`utterances[i].words[j].confidence`	number	The confidence score for the transcript of the j-th word in the i-th utterance.
`utterances[i].words[j].speaker`	string	The identified speaker name or role who uttered the j-th word in the i-th utterance.

1	import requests
2	import time
3
4	base_url = "https://api.assemblyai.com"
5
6	headers = {
7	"authorization": "<YOUR_API_KEY>"
8	}
9
10	# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11
12	upload_url = "https://assembly.ai/wildfires.mp3"
13
14	# Configure transcript with speaker identification
15
16	data = {
17	"audio_url": upload_url,
18	"speech_models": ["universal-3-pro", "universal-2"],
19	"language_detection": True,
20	"speaker_labels": True,
21	"speech_understanding": {
22	"request": {
23	"speaker_identification": {
24	"speaker_type": "name",
25	"known_values": ["Michel Martin", "Peter DeCarlo"] # Change these values to match the names of the speakers in your file
26	}
27	}
28	}
29	}
30
31	# Submit the transcription request
32
33	response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
34	transcript_id = response.json()["id"]
35	polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
36
37	# Poll for transcription results
38
39	while True:
40	transcript = requests.get(polling_endpoint, headers=headers).json()
41
42	if transcript["status"] == "completed":
43	break
44
45	elif transcript["status"] == "error":
46	raise RuntimeError(f"Transcription failed: {transcript['error']}")
47
48	else:
49	time.sleep(3)
50
51	# Access the results and print utterances to the terminal
52
53	for utterance in transcript["utterances"]:
54	print(f"{utterance['speaker']}: {utterance['text']}")

$	curl -X POST \
>	"https://api.assemblyai.com/v2/transcript" \
>	-H "Authorization: YOUR_API_KEY" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"audio_url": "https://assembly.ai/wildfires.mp3",
>	"speaker_labels": true,
>	"speech_understanding": {
>	"request": {
>	"speaker_identification": {
>	"speaker_type": "name",
>	"known_values": ["Michel Martin", "Peter DeCarlo"]
>	}
>	}
>	}
>	}'