August 15, 2025

Build a call center analytics pipeline in Python with AssemblyAI

Learn to build a Python call center analytics pipeline using AssemblyAI's Speech AI. Automatically transcribe audio, identify speakers, analyze sentiment, and create data visualizations from call recordings.

Call Centers

Python

Tutorial

Kelsey Foster

Growth

Kelsey Foster

Growth

Reviewed by

No items found.

Table of contents

[Visible on live site]

Call centers generate thousands of hours of audio daily, but most valuable information remains locked in unstructured recordings. Speech AI transforms these conversations into actionable insights—revealing customer sentiment patterns, identifying common issues, and improving agent performance.

In this tutorial, you'll build a complete call center analytics pipeline that automatically transcribes audio recordings, identifies speakers, analyzes sentiment, and creates compelling data visualizations. We'll use AssemblyAI's Speech AI models to handle the complex audio processing, then structure and visualize the results using Python.

What you'll build

By the end of this guide, you'll have a working system that can:

Transcribe call center recordings with speaker diarization
Map generic speaker labels to actual names using AI
Perform sentiment analysis on each conversation segment
Generate interactive heatmap visualizations showing sentiment patterns
Export structured data for further analysis

The complete workflow transforms raw audio into structured insights that call center managers can actually use to improve operations.

Prerequisites and setup

Before diving into the code, you'll need to set up your development environment and get the necessary API credentials.

System requirements

Python 3.7 or higher
Jupyter notebook environment (Google Colab recommended)
Internet connection for API calls

Get your AssemblyAI API key

New users receive $50 in free credits, covering this tutorial and initial experimentation.

Visit the AssemblyAI dashboard and create a free account
Navigate to the API Keys section in the left sidebar
Click Create new API key and give it a descriptive name
Copy the generated API key—you'll need this in the next step

Store this key securely since you'll be using it throughout the tutorial.

Clone the GitHub repository

The tutorial uses sample audio files and a complete Jupyter notebook from the official repository:

git clone https://github.com/dataprofessor/assemblyai
cd assemblyai

‍

The repository contains:

04-call-center-analytics.ipynb - The main tutorial notebook
Sample audio files for testing
Additional examples and utilities

If you prefer working directly in Google Colab, you can download the notebook file and upload it to your Colab environment.

Install required dependencies

The tutorial primarily uses AssemblyAI's Python SDK, with most other libraries already available in standard Python environments:

pip install assemblyai

‍

Additional libraries we'll use (typically pre-installed in Jupyter environments):

pandas - Data manipulation and analysis
altair - Data visualization
spacy - Natural language processing
IPython - Audio playback widgets

Setting up the analytics pipeline

Now that your environment is ready, let's build the analytics pipeline step by step.

Configure API authentication

First, set up secure access to your AssemblyAI API key. In Google Colab, use the secrets manager to store your credentials safely:

import assemblyai as aai
from google.colab import userdata

# Load API key from Colab secrets
aai_key = userdata.get('AI_KEY')
aai.settings.api_key = aai_key

‍

For local development, set the API key directly (use environment variables in production):

import assemblyai as aai

aai.settings.api_key = "your_api_key_here"

Load and preview the audio data

The sample audio file contains a realistic call center conversation between a customer service agent and a satisfied customer. This gives us both positive and neutral sentiment data to work with:

from IPython.display import display, Audio

# Load audio file from the repository
audio_input = "https://github.com/dataprofessor/assemblyai/raw/main/call-center
.wav"

# Hear the audio
display(Audio(audio_input))

‍

The conversation features Sarah (customer service agent) speaking with Michael Johnson (satisfied electric vehicle owner) providing positive feedback about his purchase experience.

Configure transcription parameters

AssemblyAI's transcription service offers several advanced features beyond basic speech-to-text. For call center analytics, we need speaker diarization (speaker labels) and sentiment analysis:

config = aai.TranscriptionConfig(
    speaker_labels=True,
    sentiment_analysis=True
)

‍

These configuration options enable the AI models to automatically detect speaker changes and analyze the emotional tone of each conversation segment.

Transcribing and processing the audio

Perform the transcription

With the configuration set, we can now transcribe the audio file. AssemblyAI's AI models handle complex audio processing automatically, with transcription typically completing in under 45 seconds:

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_input, config=config)

# Check transcription status
print(f"Call duration: {transcript.audio_duration} seconds")
print(f"Total words: {len(transcript.words)}")

‍

The transcript object contains rich metadata including word-level timestamps, confidence scores, and speaker labels.

Process the transcript with speaker labels

The initial transcription labels speakers generically as "A" and "B".

text_with_speaker_labels = ""

for utt in transcript.utterances:
    text_with_speaker_labels += f"Speaker {utt.speaker}: 
{utt.text}\n"

print(text_with_speaker_labels)

Map speakers to real names

Rather than using generic speaker labels, we can use AssemblyAI's LeMUR framework to automatically identify speakers from the conversation content:

unique_speakers = set(utterance.speaker for utterance in
transcript.utterances)

questions = []
for speaker in unique_speakers:
    questions.append(
        aai.LemurQuestion(
        question=f"Who is speaker {speaker}?",
        answer_format=" ")

    )

result = aai.Lemur().question(
    questions,
    input_text=text_with_speaker_labels,
    final_model=aai.LemurModel.claude_sonnet_4_20250514,
    context="Your task is to infer the speaker's name from the
speaker-labelled transcript"
)

print(result.response)

‍

This approach automatically extracts speaker names from the conversation context, making the system more robust and requiring less manual configuration.

Create speaker mapping

Once we've identified the speakers, create a mapping dictionary to replace generic labels with actual names:

import re

speaker_mapping = {}

for qa_response in result.response:
    pattern = r"Who is speaker (\w)\?"
    match = re.search(pattern, qa_response.question)
    if match and match.group(1) not in speaker_mapping.keys():
        speaker_mapping.update({match.group(1): 
qa_response.answer})

for utterance in transcript.utterances:
   speaker_name = speaker_mapping[utterance.speaker]
   print(f"{speaker_name}: {utterance.text}...")

‍

This creates a clean, readable conversation format with proper speaker identification.

Performing sentiment analysis

Extract sentiment data

AssemblyAI's sentiment analysis runs automatically when enabled in the configuration. Results provide sentence-level sentiment analysis:

# Access sentiment analysis results
sentiment_results = transcript.sentiment_analysis

# Preview sentiment data structure
for i, result in enumerate(sentiment_results[:3]):
    print(f"Segment {i+1}:")
    print(f"  Speaker: {speaker_mapping.get(result.speaker,
result.speaker)}")
    print(f"  Text: {result.text}")
    print(f"  Sentiment: {result.sentiment}")
    print(f"  Confidence: {result.confidence:.2f}")
    print()

‍

Each segment includes the speaker, text content, sentiment classification (positive/neutral/negative), and a confidence score for the analysis.

Structure data for analysis

Convert the sentiment results into a structured format that's easier to analyze and visualize:

import pandas as pd

# Create structured dataframe
sentiment_data = []
for result in sentiment_results:
    sentiment_data.append({
        'speaker': speaker_mapping.get(result.speaker,
result.speaker),
        'text': result.text,
        'sentiment': result.sentiment,
        'confidence': result.confidence
    })

df = pd.DataFrame(sentiment_data)
print(f"Created dataframe with {len(df)} conversation segments")
print(df['sentiment'].value_counts())

‍

This DataFrame structure makes it easy to perform aggregate analysis and create visualizations.

Creating data visualizations

Generate sentiment overview heatmap

Now, let's analyze the sentiment of the transcript and we can do that using the sentiment_analysis method. To use it, you can append it to the transcript like so:

transcript.sentiment_analysis

‍

# Create a DataFrame of Speaker and Sentiment
data = []
index_value = 0  # Initialize an index counter

for sentiment in transcript.sentiment_analysis:
    # speaker = sentiment.speaker
    speaker = speaker_mapping[sentiment.speaker]  # Applies our 
speaker mapping
    sentiment_value = sentiment.sentiment.value
    text = sentiment.text
    data.append({'speaker': speaker, 'sentiment':
sentiment_value, 'text': text, 'index': index_value})
    index_value += 1  # Increment the index

df = pd.DataFrame(data)

‍

Here, we'll count the occurrences of each speaker-sentiment combination

# Count the occurrences of each speaker-sentiment combination
import altair as alt

heatmap_data = df.groupby(['speaker', 'sentiment']).size().reset_index(name='count')

font_size = 14

# Create the base chart
base = alt.Chart(heatmap_data).encode(
    x=alt.X('speaker', axis=alt.Axis(title='Speaker', titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('sentiment', axis=alt.Axis(title='Sentiment', titleFontSize=font_size, labelFontSize=font_size))
)

# Create the heatmap rectangles
heatmap = base.mark_rect().encode(
    color=alt.Color('count', title='Count', scale=alt.Scale(range='heatmap')),
    tooltip=['speaker', 'sentiment', 'count']
)

# Add the text labels
text = base.mark_text(fontSize=font_size, fontWeight='bold').encode(
    text=alt.Text('count'),
    color=alt.condition(
        alt.datum.count > heatmap_data['count'].max() / 2,  # Adjust the
threshold as needed
        alt.value('white'),
        alt.value('black')
    )
)

# Combine the heatmap and text
chart = (heatmap + text).properties(
    # title='Sentiment by Speaker',
    width=300,
    height=300
).interactive()

‍

Once we have the structured data, we'll generate a heatmap showing the sentiment occurence as a function of the speakers.

chart

‍

This heatmap provides a quick overview of conversation dynamics, showing how much positive, neutral, or negative sentiment each speaker expressed.

Heatmap of Sentiment Analysis

For deeper analysis, we can zoom into the individual sentences and see the sentiment for sequences of words as spoken in the transcript.

font_size = 12

# Define the color scale for sentiment
sentiment_colors = {
    'POSITIVE': '#4CAF50',  # Green
    'NEUTRAL': '#9E9E9E',   # Gray
    'NEGATIVE': '#F44336'    # Red
}

# Create the base chart
base = alt.Chart(df).encode(
    x=alt.X('speaker:N', axis=alt.Axis(title='Speaker', 
titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('index:O', axis=alt.Axis(title=None, labels=False)) 
# Use 'index' for Y-axis, hide labels
)

# Create the heatmap rectangles with black stroke (border)
heatmap = base.mark_rect(stroke='black').encode(
    color=alt.Color('sentiment:N', scale=alt.Scale(domain=list(sentiment_colors.keys()), range=list(sentiment_colors.values())),
                    legend=alt.Legend(orient='bottom')),  # Move 
legend to the bottom
    tooltip=['speaker:N', 'sentiment:N', 'text:N']
).properties(
    width=200,  # Reduced width for the heatmap
    height=df.shape[0] * 20  # Adjust height based on the number
of rows
)

# Add the text column to the left of the chart and hide its y-axis
text_right = alt.Chart(df).mark_text(align='left', 
baseline='middle', dx=5).encode(
    y=alt.Y('index:O', axis=None),  # Remove y-axis from text
    text=alt.Text('text:N'),
    color=alt.value('black')
).properties(
    width=10,  # Adjust width for the text column
    height=df.shape[0] * 20  # Ensure consistent height
)

# Combine the heatmap and the text
chart = alt.concat(
    heatmap,
    text_right
).properties(
    # title='Call Center Data Visualization',
).configure_axis(
    labelFontSize=font_size,
    titleFontSize=font_size
).configure_view(
    strokeOpacity=0
    #strokeWidth=1,  # Add a border to the entire view
    #stroke='black'  # Make the border black
).interactive()

chart

Key takeaways

Speech AI transforms unstructured call center audio into actionable business intelligence. The complete pipeline automatically handles transcription, speaker identification, sentiment analysis, and visualization—tasks that would require significant manual effort otherwise.

The approach scales effectively from individual call analysis to enterprise-wide conversation intelligence systems. AssemblyAI's AI models handle complex audio processing challenges, letting you focus on extracting business value from structured results.

Whether you're analyzing customer satisfaction trends, monitoring agent performance, or identifying common support issues, this foundation provides the technical framework to build more sophisticated call center analytics systems. The complete code and sample files are available in the GitHub repository. Start building with $50 in free credits for new accounts.

Get a free API key

Access AssemblyAI's API with $50 in free credits

Get Free API Key

Build a call center analytics pipeline in Python with AssemblyAI

What you'll build

Prerequisites and setup

System requirements

Get your AssemblyAI API key

Clone the GitHub repository

Install required dependencies

Setting up the analytics pipeline

Configure API authentication

Load and preview the audio data

Configure transcription parameters

Transcribing and processing the audio

Perform the transcription

Process the transcript with speaker labels

Map speakers to real names

Create speaker mapping

Performing sentiment analysis

Extract sentiment data

Structure data for analysis

Creating data visualizations

Generate sentiment overview heatmap

Heatmap of Sentiment Analysis

Key takeaways

Python Speech-to-Text with Punctuation, Casing, and Formatting

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

Real-time transcription in Python with Universal-Streaming

Summarize audio with LLMs in Node.js

Florence-2: How it works and how to use it

New for Enterprise: Improved Accuracy, Always-on Support, and SOC 2 Type 2

The Best Audio File Formats for Speech-to-Text: A Guide

Do I Need A Custom Speech Recognition Model?

Build a call center analytics pipeline in Python with AssemblyAI

What you'll build

Prerequisites and setup

System requirements

Get your AssemblyAI API key

Clone the GitHub repository

Install required dependencies

Setting up the analytics pipeline

Configure API authentication

Load and preview the audio data

Configure transcription parameters

Transcribing and processing the audio

Perform the transcription

Process the transcript with speaker labels

Map speakers to real names

Create speaker mapping

Performing sentiment analysis

Extract sentiment data

Structure data for analysis

Creating data visualizations

Generate sentiment overview heatmap

Heatmap of Sentiment Analysis

Key takeaways

Related posts

Python Speech-to-Text with Punctuation, Casing, and Formatting

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

Real-time transcription in Python with Universal-Streaming

Summarize audio with LLMs in Node.js

Florence-2: How it works and how to use it

New for Enterprise: Improved Accuracy, Always-on Support, and SOC 2 Type 2

The Best Audio File Formats for Speech-to-Text: A Guide

Do I Need A Custom Speech Recognition Model?