Speaker Diarization

AssemblyAI's new streaming diarization model delivers 66% fewer false speakers and 91% fewer phantom turns than our previous version, leading the competition on production metrics.  

Word-level speaker labels. streaming and async. Production-ready.
Contact Center Call
00:00
01:46
00:01
Lauren
Thank you for calling Nissan. My name is Lauren. Can I have your name?
00:04
John Smith
Yeah, my name is John Smith.
00:07
Lauren
Thank you, John. How can I help you?
00:09
John Smith
I was just calling about to see how much it would cost to update the map in my car.
00:13
Lauren
I'd be happy to help you with that today. Did you receive a mailer from us?
00:17
John Smith
I did. Do you need the customer number?
00:19
Lauren
Yes, please.
00:20
John Smith
Okay, it's 15243.
00:23
Lauren
Thank you. And the year, make and model of your vehicle?
00:26
John Smith
Yeah, I have a 2009 Nissan Altima.
00:30
Lauren
Oh, nice car.
00:31
John Smith
Yeah, thank you. We really enjoy it.
00:34
Lauren
Okay, I think I found your profile here. Can I have you verify your address and phone number, please?
00:39
John Smith
Yes, it's 1255 North Research Way. That's in Orem, Utah, 84097. And my phone number is 801-431-1000.
00:51
Lauren
Thanks, John. I located your information. The newest version we have available for your vehicle is version 7.3, which was released in March of 2012. The price of the new map is $99 plus shipping and tax. Let me go ahead and set up this order for you.
01:08
John Smith
Well, can we wait just a second? I'm not really sure if I can afford it right now.
01:12
Lauren
All right, well, here are a few reasons to consider purchasing today. It looks as though you haven't updated your vehicle for three years. So that would be the equivalent of getting three years worth of updates for the price of one.
01:24
John Smith
Oh, okay.
01:25
Lauren
In addition, special offers like the current promotion don't come around too. I would definitely recommend taking advantage of the extra $50 off before it expires.
01:35
John Smith
Yeah, that does sound pretty good.
01:37
Lauren
 If I set this order up for you now, it'll ship out today and for $50 less. Do you have your credit card handy and I can place this order for you now?
01:46
John Smith
Yeah, let's go ahead and use a Visa. My number is
Speakers Identified
Lauren
John Smith
Sentiment Analysis
We identified 11 positive sentences, and 3 negative ones.
Overall Call Outcome
The call successfully transitioned from an informational inquiry to a sales transaction, with Lauren using sales techniques (emphasizing value, creating urgency, and overcoming objections) to secure the purchase before John could reconsider.

Get started with production-quality speaker diarization

Enable speaker_labels and get word-level attribution across streaming and async audio. No model configuration required.

Get started with the API
1base_url = "https://api.assemblyai.com"
2headers = {
3    "authorization": "<YOUR_API_KEY>"
4}
5with open("./my-audio.mp3", "rb") as f:
6  response = requests.post(base_url + "/v2/upload",
7                          headers=headers,
8                          data=f)
9upload_url = response.json()["upload_url"]
10data = {
11    "audio_url": upload_url, # You can also use a URL to an audio or video file on the web
12    "speech_models": ["universal-3-pro", "universal-2"],
13    "language_detection": True,
14    "speaker_labels": True
15}
1import { AssemblyAI } from 'assemblyai'
2
3const client = new AssemblyAI({
4  apiKey: 'YOUR_API_KEY'
5})
6
7const audioUrl =
8  'https://assembly.ai/sports_injuries.mp3'
9
10const params = {
11  audio: audioUrl,
12  speaker_labels: true
13}
14
15const run = async () => {
16  const transcript = await client.transcripts.transcribe(params)
17  console.log(transcript.text)
18
19  for (let utterance of transcript.utterances!) {
20    console.log(`Speaker ${utterance.speaker}: ${utterance.text}`)
21  }
22}
23
24run()
1package main
2
3import (
4    "context"
5    "fmt"
6    "os"
7
8    aai "github.com/AssemblyAI/assemblyai-go-sdk"
9)
10
11func main() {
12    ctx := context.Background()
13
14    audioURL := "https://assembly.ai/sports_injuries.mp3"
15
16    client := aai.NewClient("YOUR_API_KEY")
17
18    params := &aai.TranscriptOptionalParams{
19        SpeakerLabels: aai.Bool(true),
20    }
21
22    transcript, err := client.Transcripts.TranscribeFromURL(ctx, audioURL, params)
23    if err != nil {
24        fmt.Println("Something bad happened:", err)
25        os.Exit(1)
26    }
27
28    fmt.Println(*transcript.Text)
29
30    for _, utterance := range transcript.Utterances {
31        fmt.Printf("Speaker %v: %v
32", *utterance.Speaker, *utterance.Text)
33    }
34}
1import com.assemblyai.api.AssemblyAI;
2import com.assemblyai.api.resources.transcripts.types.*;
3
4public final class App {
5    public static void main(String[] args) {
6        AssemblyAI client = AssemblyAI.builder()
7                .apiKey("YOUR_API_KEY")
8                .build();
9
10        String audioUrl = "https://assembly.ai/sports_injuries.mp3";
11
12        var params = TranscriptOptionalParams.builder()
13                .speakerLabels(true)
14                .build();
15
16        Transcript transcript = client.transcripts().transcribe(audioUrl, params);
17
18        System.out.println(transcript.getText().get());
19
20        transcript.getUtterances().get().forEach(utterance ->
21            System.out.println("Speaker " + utterance.getSpeaker() + ": " + utterance.getText())
22        );
23    }
24}
1require 'assemblyai'
2
3client = AssemblyAI::Client.new(api_key: 'YOUR_API_KEY')
4
5audio_url = 'https://assembly.ai/sports_injuries.mp3'
6
7transcript = client.transcripts.transcribe(
8  audio_url: audio_url,
9  speaker_labels: true
10)
11
12abort transcript.error if transcript.status == AssemblyAI::Transcripts::TranscriptStatus::ERROR
13
14puts transcript.text
15
16transcript.utterances.each do |utterance|
17  printf('Speaker %<speaker>s: %<text>s', speaker: utterance.speaker, text: utterance.text)
18end

Improve transcription quality and readability

Sentence-level speaker consistency ensures the last word of a sentence never bleeds into the next speaker. Correctly transcribe and diarize conversations with 30+ speakers in 95+ languages.

Short utterances like 'Yeah,' 'Okay,' and single-word responses are correctly attributed in multi-speaker settings.

Get started for free

Make every voice count - built for what production apps actually need.

Eliminate phantom turns
Detect speaker changes mid-sentence
Real-time diarization for streaming apps
Accurate summarization per speaker
Ship AI notetakers that just work
Stop users from repeating themselves
Reliable contact center inputs
95+ languages, streaming and async
AssemblyAI’s managed API endpoint and diarization won me over—something Whisper couldn’t provide.
Josh Mohrer, Founder at Wave.co

Start building with the most accurate diarization API

66% fewer false speakers. 91% fewer phantom turns. Word-level attribution out of the box. Streaming and async, free to start.