Identifying speakers in audio recordings
When applying the Speaker Diarization model, the transcription not only contains the text but also includes speaker labels, enhancing the overall structure and organization of the output.
In this step-by-step guide, you'll learn how to apply the model. In short, you have to send the
speaker_labels parameter in your request, and then find the results inside a field called
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.
The complete source code for this guide can be viewed here.
Here is an audio example for this guide:
Create a new file and import the necessary libraries for making an HTTP request.
Set up the API endpoint and headers. The headers should include your API token.
Upload your local file to the AssemblyAI API.
upload_urlreturned by the AssemblyAI API to create a JSON payload containing the
audio_urlparameter and the
speaker_labelsparamter set to
POSTrequest to the AssemblyAI API endpoint with the payload and headers.
After making the request, you'll receive an ID for the transcription. Use it to poll the API every few seconds to check the status of the transcript job. Once the status is
completed, you can retrieve the transcript from the API response, using the
utteranceskey to access the results.
Understanding the response
The speaker label information is included in the
utterances key of the response. Each utterance object in the list includes a
speaker field, which contains a string identifier for the speaker (e.g., "A", "B", etc.). The utterances list also contains a
text field for each utterance containing the spoken text, and
confidence scores both for utterances and their individual words.
Automatically identifying different speakers from an audio recording, also called speaker diarization, is a multi-step process. It can unlock additional value from many genres of recording, including conference call transcripts, broadcast media, podcasts, and more. You can learn more about use cases for speaker diarization and the underlying research from the AssemblyAI blog.