Speaker Diarization

AssemblyAI's new streaming diarization model delivers 66% fewer false speakers and 91% fewer phantom turns than our previous version, leading the competition on production metrics.

Word-level speaker labels. streaming and async. Production-ready.

Try our API for free

Contact Center Call

00:00

01:46

00:01

Lauren

Thank you for calling Nissan. My name is Lauren. Can I have your name?

00:04

John Smith

Yeah, my name is John Smith.

00:07

Lauren

Thank you, John. How can I help you?

00:09

John Smith

I was just calling about to see how much it would cost to update the map in my car.

00:13

Lauren

I'd be happy to help you with that today. Did you receive a mailer from us?

00:17

John Smith

I did. Do you need the customer number?

00:19

Lauren

Yes, please.

00:20

John Smith

Okay, it's 15243.

00:23

Lauren

Thank you. And the year, make and model of your vehicle?

00:26

John Smith

Yeah, I have a 2009 Nissan Altima.

00:30

Lauren

Oh, nice car.

00:31

John Smith

Yeah, thank you. We really enjoy it.

00:34

Lauren

Okay, I think I found your profile here. Can I have you verify your address and phone number, please?

00:39

John Smith

Yes, it's 1255 North Research Way. That's in Orem, Utah, 84097. And my phone number is 801-431-1000.

00:51

Lauren

Thanks, John. I located your information. The newest version we have available for your vehicle is version 7.3, which was released in March of 2012. The price of the new map is $99 plus shipping and tax. Let me go ahead and set up this order for you.

01:08

John Smith

Well, can we wait just a second? I'm not really sure if I can afford it right now.

01:12

Lauren

All right, well, here are a few reasons to consider purchasing today. It looks as though you haven't updated your vehicle for three years. So that would be the equivalent of getting three years worth of updates for the price of one.

01:24

John Smith

Oh, okay.

01:25

Lauren

In addition, special offers like the current promotion don't come around too. I would definitely recommend taking advantage of the extra $50 off before it expires.

01:35

John Smith

Yeah, that does sound pretty good.

01:37

Lauren

If I set this order up for you now, it'll ship out today and for $50 less. Do you have your credit card handy and I can place this order for you now?

01:46

John Smith

Yeah, let's go ahead and use a Visa. My number is

Speakers Identified

Lauren

John Smith

Sentiment Analysis

We identified 11 positive sentences, and 3 negative ones.

Overall Call Outcome

The call successfully transitioned from an informational inquiry to a sales transaction, with Lauren using sales techniques (emphasizing value, creating urgency, and overcoming objections) to secure the purchase before John could reconsider.

Get started with production-quality speaker diarization

Enable speaker_labels and get word-level attribution across streaming and async audio. No model configuration required.

Get started with the API

1base_url = "https://api.assemblyai.com"
2headers = {
3    "authorization": "<YOUR_API_KEY>"
4}
5with open("./my-audio.mp3", "rb") as f:
6  response = requests.post(base_url + "/v2/upload",
7                          headers=headers,
8                          data=f)
9upload_url = response.json()["upload_url"]
10data = {
11    "audio_url": upload_url, # You can also use a URL to an audio or video file on the web
12    "speech_models": ["universal-3-pro", "universal-2"],
13    "language_detection": True,
14    "speaker_labels": True
15}

1import { AssemblyAI } from 'assemblyai'
2
3const client = new AssemblyAI({
4  apiKey: 'YOUR_API_KEY'
5})
6
7const audioUrl =
8  'https://assembly.ai/sports_injuries.mp3'
9
10const params = {
11  audio: audioUrl,
12  speaker_labels: true
13}
14
15const run = async () => {
16  const transcript = await client.transcripts.transcribe(params)
17  console.log(transcript.text)
18
19  for (let utterance of transcript.utterances!) {
20    console.log(`Speaker ${utterance.speaker}: ${utterance.text}`)
21  }
22}
23
24run()

1package main
2
3import (
4    "context"
5    "fmt"
6    "os"
7
8    aai "github.com/AssemblyAI/assemblyai-go-sdk"
9)
10
11func main() {
12    ctx := context.Background()
13
14    audioURL := "https://assembly.ai/sports_injuries.mp3"
15
16    client := aai.NewClient("YOUR_API_KEY")
17
18    params := &aai.TranscriptOptionalParams{
19        SpeakerLabels: aai.Bool(true),
20    }
21
22    transcript, err := client.Transcripts.TranscribeFromURL(ctx, audioURL, params)
23    if err != nil {
24        fmt.Println("Something bad happened:", err)
25        os.Exit(1)
26    }
27
28    fmt.Println(*transcript.Text)
29
30    for _, utterance := range transcript.Utterances {
31        fmt.Printf("Speaker %v: %v
32", *utterance.Speaker, *utterance.Text)
33    }
34}

1import com.assemblyai.api.AssemblyAI;
2import com.assemblyai.api.resources.transcripts.types.*;
3
4public final class App {
5    public static void main(String[] args) {
6        AssemblyAI client = AssemblyAI.builder()
7                .apiKey("YOUR_API_KEY")
8                .build();
9
10        String audioUrl = "https://assembly.ai/sports_injuries.mp3";
11
12        var params = TranscriptOptionalParams.builder()
13                .speakerLabels(true)
14                .build();
15
16        Transcript transcript = client.transcripts().transcribe(audioUrl, params);
17
18        System.out.println(transcript.getText().get());
19
20        transcript.getUtterances().get().forEach(utterance ->
21            System.out.println("Speaker " + utterance.getSpeaker() + ": " + utterance.getText())
22        );
23    }
24}

1require 'assemblyai'
2
3client = AssemblyAI::Client.new(api_key: 'YOUR_API_KEY')
4
5audio_url = 'https://assembly.ai/sports_injuries.mp3'
6
7transcript = client.transcripts.transcribe(
8  audio_url: audio_url,
9  speaker_labels: true
10)
11
12abort transcript.error if transcript.status == AssemblyAI::Transcripts::TranscriptStatus::ERROR
13
14puts transcript.text
15
16transcript.utterances.each do |utterance|
17  printf('Speaker %<speaker>s: %<text>s', speaker: utterance.speaker, text: utterance.text)
18end

Improve transcription quality and readability

Sentence-level speaker consistency ensures the last word of a sentence never bleeds into the next speaker. Correctly transcribe and diarize conversations with 30+ speakers in 95+ languages.

Short utterances like 'Yeah,' 'Okay,' and single-word responses are correctly attributed in multi-speaker settings.

Get started for free

Build confidently with the most accurate, production-grade speaker diarization

Word-Level Speaker Attribution

Unlike turn-level diarization, every word carries its own speaker label so when speakers overlap or change mid-sentence, attribution stays accurate. No more misattributed words corrupting your transcripts.

Outperforms Competitors in Head-to-Head Testing

2x better cpWER than Deepgram Nova-3 on streaming two-speaker telephony. 13% better on four-speaker meetings. 42% fewer false-alarm speakers. Built and benchmarked for the conversations that matter most in production.

Make every voice count - built for what production apps actually need.

Eliminate phantom turns	Detect speaker changes mid-sentence	Real-time diarization for streaming apps	Accurate summarization per speaker
Ship AI notetakers that just work	Stop users from repeating themselves	Reliable contact center inputs	95+ languages, streaming and async

Get Started With Diarization

AssemblyAI’s managed API endpoint and diarization won me over—something Whisper couldn’t provide.

Josh Mohrer, Founder at Wave.co

Start building with the most accurate diarization API

66% fewer false speakers. 91% fewer phantom turns. Word-level attribution out of the box. Streaming and async, free to start.

Try our API for free Contact sales