June 23, 2026

Automatic summarization with LLMs in Python

Learn how to perform automatic summarization with Python using LLMs in this easy-to-follow tutorial.

Ryan O'Connor

Senior Developer Educator

Tutorial

Summarization

LLMs

Reviewed by

Table of contents

[Visible on live site]

You've got a three-hour podcast, a backlog of sales calls, and a stack of webinar recordings. Nobody's going to watch all of it. What everyone actually wants is the 200-word version that tells them whether it's worth their time.

That's the job summarization does, and with the right models you can ship it as a feature in an afternoon.

Here's the trick most tutorials skip: summarizing spoken content isn't one problem, it's two. You transcribe the audio to text, then you run a large language model over that text. Get either step wrong and the whole thing falls apart. Garbage transcript in, garbage summary out.

In this tutorial we'll build the full pipeline in Python — transcribe with AssemblyAI's speech-to-text API, then summarize through the LLM Gateway. We'll cover plain text, video, audio files, evaluating quality, long-form handling, async speed, and custom prompts.

Getting started

The pipeline has two stages: transcribe audio to text, then process that text with an LLM. The Universal-3 family of models handles transcription, and the LLM Gateway handles the summary.

We'll use the AssemblyAI Python SDK. Spin up a virtual environment first.

python -m venv transcriber  # you may have to use `python3`

Activate it on macOS or Linux:

source ./transcriber/bin/activate

On Windows:

.\transcriber\Scripts\activate.bat

Install the SDK and requests:

pip install assemblyai requests

You'll need an API key — grab one free from the dashboard. To call the LLM Gateway you'll need to add a credit card to the account. Set the key as an environment variable.

On Linux and macOS:

export ASSEMBLYAI_API_KEY=your-key-here

On Windows:

set ASSEMBLYAI_API_KEY=your-key-here

Don't hardcode the key in your script. If you want it in code, load it from a .env file with python-dotenv and keep that file out of source control.

Extractive vs. abstractive: which one you actually want

There are two ways to summarize text, and they're not equal.

Extractive pulls important sentences straight out of the source. It's fast and it never hallucinates because every word came from the original. But it reads like a ransom note — disjointed, choppy, no connective tissue.

Abstractive generates new sentences that capture the meaning. This is what modern LLMs do, and it's what you reach through the Gateway. The output reads like a human wrote it. The tradeoff is you have to watch for the model inventing facts, which is why we'll evaluate quality later.

For 95% of product features — meeting recaps, video descriptions, call summaries — you want abstractive. So that's what we'll build.

Direct text summarization with Python

If you already have text, summarizing it is one API call. The LLM Gateway is OpenAI-compatible, so if you've ever hit a chat-completions endpoint, this'll feel familiar.

import requests
import os

api_key = os.getenv("ASSEMBLYAI_API_KEY")

text = """<your long document or article here>"""

prompt = f"""Summarize the following text in three to five bullet points.
Keep it tight and skip anything that isn't a main idea.

Text:
{text}
"""

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers={"authorization": api_key},
    json={
        "model": "claude-sonnet-4-5-20250929",
        "messages": [{"role": "user", "content": prompt}],
    },
)

result = response.json()
print(result["choices"][0]["message"]["content"])

That's the whole pattern: build a prompt, POST it, read the response. Everything else in this tutorial is variations on this. EU customers swap the host for llm-gateway.eu.assemblyai.com. The Gateway gives you 25+ models behind one endpoint, so swapping claude-sonnet-4-5-20250929 for a different model is a one-line change.

Plain text is the easy case, though. Conversational audio and video carry speaker turns, crosstalk, an

Build your summarization pipeline free

Get an API key, transcribe with the Universal-3 family, and summarize through the LLM Gateway in one Python script.

Summarizing a video with Python

Let's do a real one. We'll summarize an episode of the Lex Fridman podcast where Lex talks with Guido van Rossum, the creator of Python.

Transcription

First, transcribe the video. Create autosummarize.py:

import assemblyai as aai
import requests
import os

aai.settings.api_key = os.getenv("ASSEMBLYAI_API_KEY")

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"]
)

transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(
    "https://storage.googleapis.com/aai-web-samples/lex_guido.webm"
)

A few things worth knowing here. The speech_models parameter takes a list — pass ["universal-3-pro", "universal-2"] and the API uses the first model your account and language support. The old singular speech_model param is deprecated, so don't reach for it. Universal-3 Pro covers six languages and supports keyterms prompting; Universal-2 covers 99 languages. Pricing runs starting at $0.21/hr for Universal-3 Pro and $0.15/hr for Universal-2 — check the pricing page for current rates.

Every transcript gets a unique ID at transcript.id. Hang onto it — you can re-fetch a transcript later instead of re-transcribing the same file, which saves you money and time when you want to run new summaries over old audio.

Now add error handling:

if transcript.status == aai.TranscriptStatus.error:
    raise Exception(f"Transcription error: {transcript.error}")

The summary

We've got the transcript text. Now we write a prompt and send it to the Gateway. Good prompts have three parts: a role, an instruction, and the content.

prompt = f"""You are an expert at summarizing podcast episodes.
Provide a summary of the following transcript. It's from an episode of
the Lex Fridman podcast, where he speaks with Guido van Rossum, the
creator of the Python programming language.

Format the summary in markdown using this structure for each topic:
**<topic header>**
<topic summary>

Transcript:
{transcript.text}
"""

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers={"authorization": aai.settings.api_key},
    json={
        "model": "claude-sonnet-4-5-20250929",
        "messages": [{"role": "user", "content": prompt}],
    },
)

result = response.json()
print(result["choices"][0]["message"]["content"])

Run it:

python autosummarize.py

After a minute or two you'll get something like this:

**Python's design choices**
Guido explains why Python uses indentation over curly braces — less
clutter, friendlier for beginners — while acknowledging most programmers
arrive used to braces from other languages.

**Improving CPython's performance**
CPython started simple. Over time, more optimized algorithms landed,
illustrating the classic time-space tradeoff.

**Asynchronous I/O in Python**
The standard library's early async modules went stale. Around 2012–2014,
Guido drove a redesign, working with third-party library authors, and the
result has been especially successful for web clients.

**The global interpreter lock (GIL)**
The GIL made multi-threading viable on single-core CPUs but became a pain
point as multi-core hardware spread. Removing it remains a long-term option.

**Guido's experience as BDFL**
The role gave the community direction but cost Guido personally. The newer
steering-council structure has kept things steady.

That's a watchable-decision in your hands: skip it or queue it. Built on two API calls.

Summarizing audio files with Python

The SDK takes audio files the same way it takes video, so the code barely changes. Point it at an MP3 instead — here's a GitLab logistics meeting recording:

import assemblyai as aai
import requests
import os

aai.settings.api_key = os.getenv("ASSEMBLYAI_API_KEY")

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"]
)
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(
    "https://storage.googleapis.com/aai-web-samples/meeting.mp3"
)

if transcript.status == aai.TranscriptStatus.error:
    raise Exception(f"Transcription error: {transcript.error}")

prompt = f"""You are an expert at summarizing meetings.
Summarize the following transcript from a GitLab logistics meeting.

Format the summary in markdown using this structure for each topic:
**<topic header>**
<topic summary>

Transcript:
{transcript.text}
"""

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers={"authorization": aai.settings.api_key},
    json={
        "model": "claude-sonnet-4-5-20250929",
        "messages": [{"role": "user", "content": prompt}],
    },
)

result = response.json()
print(result["choices"][0]["message"]["content"])

Output:

**Engineering key review**
A proposal to split the engineering key review into four departmental
reviews on a two-month rotation, so discussion goes deeper without piling
on meetings. Supported.

**R&D merge request rates**
Clarified the difference between the wider rate (includes community
contributions) and the overall rate. The team will track community
percentage over time.

**Postgres replication**
Work is underway on replication lag for data engineering — a dedicated
host, database tuning, demand improvements. More to come next review.

**Key metrics**
NPS decline has slowed. The narrow merge request rate sits below target but
ahead of last year, with a rebound expected in March.

Same two calls. The only thing that changed was the file and the prompt's framing. That's the point — once the pipeline exists, every new content type is a prompt tweak.

For richer summaries on multi-speaker audio, turn on speaker diarization so the transcript carries speaker labels. Feeding "Speaker A said X, Speaker B pushed back" into the model produces summaries that actually follow the conversation instead of mashing everyone into one voice. This is the backbone of most conversation intelligence features.

Test it without writing code

Upload an audio or video file in the playground to see transcription and summarization side by side.

Try playground

Evaluating summary quality

Here's the part people skip and then regret in production.

You can't ship summaries you haven't measured. The classic metrics — ROUGE and BLEU — count word overlap between your summary and a reference. They're fast, and they're nearly useless for abstractive summaries, because a great summary can use entirely different words than the source and still be perfect.

What you actually want is an LLM grading the output. Send the original transcript and the generated summary back to the Gateway and ask it three questions:

Does the summary contain anything that isn't in the original? (That's a hallucination.)
Are the key points represented accurately?
Is anything critical missing?

eval_prompt = f"""You are evaluating a summary for factual accuracy.
Compare the summary against the source transcript and answer:
1. Does the summary contain any information not present in the source?
2. Are the key points represented accurately?
3. Is anything critical missing?

Respond with a JSON object: {{"hallucinations": [], "accurate": true/false, "missing": []}}

SOURCE:
{transcript.text}

SUMMARY:
{generated_summary}
"""

Run this on a sample of outputs and you'll catch the failure modes before your users do. It costs a second API call per summary, which is cheap insurance against shipping a recap that invents a quarterly target nobody mentioned.

Handling long-form content

LLMs have a finite context window. You can't drop a three-hour transcript into one prompt and expect it to fit — and even when it technically fits, quality degrades on very long inputs.

The fix is chunking. Break the transcript into segments that fit comfortably in the model's window, summarize each chunk, then summarize the summaries. This map-reduce pattern handles arbitrarily long content.

When you use the LLM Gateway, you own this chunking logic. The Gateway is a thin, OpenAI-compatible layer over the model — it doesn't silently split your input for you. That's a feature, not a gap: you control exactly how the transcript gets segmented, which matters when you want chunks to break on speaker turns or topic shifts instead of mid-sentence. Read the LLM Gateway docs for request limits.

A simple chunker:

def chunk_text(text, max_chars=12000):
    words = text.split()
    chunks, current = [], []
    length = 0
    for word in words:
        if length + len(word) + 1 > max_chars:
            chunks.append(" ".join(current))
            current, length = [], 0
        current.append(word)
        length += len(word) + 1
    if current:
        chunks.append(" ".join(current))
    return chunks

Summarize each chunk, concatenate the partial summaries, then run one final summarization pass over the combined result.

Speeding things up with async processing

Transcription and summarization both take real time — seconds to minutes depending on file length. You don't want your web request blocking on that.

The AssemblyAI SDK supports async submission: kick off a transcription, get an ID back immediately, and poll or use webhooks to retrieve the result later. Pair that with a background queue and your API stays responsive while the heavy work happens out of band.

For the summarization step, fire your Gateway calls concurrently. If you're summarizing 50 chunks, run them in parallel with asyncio and aiohttp instead of one at a time. On a long podcast this turns a two-minute wait into a few seconds.

Getting better summaries with custom prompts

Your summary is only as good as your prompt. The model is excellent; vague instructions waste it.

The biggest lever is structured output. Instead of "summarize this," tell the model the exact shape you want back. Want a JSON object your frontend can render directly? Ask for one:

prompt = f"""Summarize the following transcript. Return a JSON object with
two keys: "main_points" and "action_items".

Transcript:
{transcript.text}

Respond with valid JSON only:
{{
  "main_points": ["<each main point as a string>"],
  "action_items": ["<each action item as a string>"]
}}
"""

Now you're getting back data, not prose — bullet-point summaries, action items, sentiment tags, whatever your product needs. The same prompt pattern powers video titles, SEO descriptions, and social copy. Change the instruction, keep the pipeline.

If you're building something more conversational on top of this — a voice agent that summarizes calls in real time, say — the same Gateway models slot right in.

Talk through your architecture

Building summarization into a production app? Our team can help you scope transcription, chunking, and model choice.

Talk to AI expert

Where this goes next

The interesting shift isn't that summarization got easy — it's that the transcript stopped being the deliverable. A transcript used to be the product. Now it's the substrate. Once your audio is text and you've got a Gateway call wired up, summaries are just the first thing you build. Next come chapter markers, searchable archives, auto-generated clips, and agents that answer questions about hours of recordings nobody has time to watch.

The teams winning here aren't the ones with the fanciest prompt. They're the ones treating every recording as queryable data from the moment it lands. Build the pipeline once, and every feature after that is a prompt away.

Frequently asked questions

Is there an API to create bullet-point summaries from transcripts?

Yes — AssemblyAI's LLM Gateway generates bullet-point summaries from any transcript with a single POST request. Transcribe your audio with the speech-to-text API, then send the transcript text plus a prompt asking for bullet points to https://llm-gateway.assemblyai.com/v1/chat/completions. You control the exact format through the prompt, so you can request bullets, paragraphs, JSON, or any structure your app needs.

What is AssemblyAI's LLM Gateway and how does it work?

The LLM Gateway is an OpenAI-compatible API that gives you access to 25+ large language models through a single endpoint. You send a standard chat-completions request with a model field like claude-sonnet-4-5-20250929 and your authorization header, and it routes the call to that model. It replaced LeMUR, and it's the current way to run summarization, Q&A, and other LLM tasks over AssemblyAI transcripts.

Does AssemblyAI provide customizable summary types?

Yes — because summaries run through the LLM Gateway, you define the type entirely in your prompt. You can ask for a one-line TL;DR, topic-by-topic breakdowns, action items, chapter summaries, or a structured JSON object with named keys. There are no fixed summary "modes" to choose from; the prompt is the configuration.

How do I request a summary as part of the API call?

You make two calls: one to transcribe the audio and one to summarize. Submit the file to the transcription endpoint, then pass the returned transcript text into an LLM Gateway chat-completions request along with your summarization prompt. Wrap both in a function and it's effectively a single operation from your app's point of view.

Can AssemblyAI integrate with LLMs?

Yes — the LLM Gateway is purpose-built for exactly this, offering OpenAI-compatible access to models from multiple providers behind one API and one bill. You pass the model name in the request, so you can switch between providers without rewriting your integration. It's the same endpoint whether you're summarizing, classifying, or extracting structured data from transcripts.

‍