Skip to main content


AssemblyAI's Summarization model provides a powerful tool for quickly distilling important information from an audio file.


In the Summarizing virtual meetings guide, the client uploads an audio file and configures the API request to create a summary of the content. By processing the audio, the Summarization model generates a brief yet informative overview that highlights the essential points of the recording.

You can explore the full JSON response here:


You run this code snippet in Colab here, or you can view the full source code here.

Note that Auto Chapters and Summarization cannot both be used in the same request.

Understanding the response

The JSON object above contains all information about the transcription. Depending on which Models are used to analyze the audio, the attributes of this object will vary. For example, in the quickstart above we did not enable Entity Detection, which is reflected by the entity_detection: false key-value pair in the JSON above. Had we activated Entity Detection, then the entitieskey value would contain the file summary (and additional details) rather than the current null value.

To access the Summarization information, we use the summarization, summary, summary_type, and summary_model keys:

The reference table below lists all relevant attributes along with their descriptions, where we've called the JSON response object results. Object attributes are accessed via dot notation, and arbitrary array elements are denoted with [i]. For example, results.words[i].text refers to the text attribute of the i-th element of the words array in the JSON results object.

results.summarizationbooleanWhether Summarization was enabled in the transcription request
results.summarystringThe summary of the audio file
results.summary_typestringThe type of summary selected for output (see below for additional details)
results.summary_modelstringThe summarization model selected (see below for additional details)

Types and models

The tables below show which summary types and summary models are compatible as well as their best use cases.

If you specify one of summary_model and summary_type, you also need to specify the other.

bullets (default)A bulleted summary with the most important points.

- The human brain has nearly tripled in mass in two million years.

- One of the main reasons that our brain got so big is because it got a new part, called the frontal lobe.

bullets_verboseA longer bullet point list summarizing the entire transcription text.Dan Gilbert is a psychologist and a happiness expert. His talk is recorded live at Ted conference. He explains why the human brain has nearly tripled in size in 2 million years. He also explains the difference between winning the lottery and becoming a paraplegic. - In 1994, Pete Best said he's happier than he would have been with the Beatles. In the free choice paradigm, monet prints are ranked from the one they like the most to the one that they don't. People prefer the third one over the fourth one because it's a little better. - People synthesize happiness when they change their affective. Hedonic aesthetic to make up your mind and change your mind is the friend of natural happiness. But it's the enemy of synthetic happiness. The psychological immune system works best when we are stuck. This is the difference between dating and marriage. People don't know this about themselves and it can work to their disadvantage. - In a photography course at Harvard, 66% of students choose not to take the course where they have the opportunity to change their mind. Adam Smith said that some things are better than others. Dan Gilbert recorded at Ted, 2004 in Monterey, California, 2004.
gistA few words summarizing the entire transcription text.A big brain
headlineA single sentence summarizing the entire transcription text.The human brain has nearly tripled in mass in two million years.
paragraphA single paragraph summarizing the entire transcription text.The human brain has nearly tripled in mass in two million years. It went from the one-and-a-quarter-pound brain of our ancestor, habilis, to the almost three-pound meatloaf everybody here has between their ears.
informative (default)Best for files with a single speaker such as presentations or lecturesbullets, bullets_verbose, headline, or paragraphpunctuate and format_text set to true
conversationalBest for any 2 person conversation such as customer/agent or interview/interviewee callsbullets, bullets_verbose, headline, or paragraphpunctuate, format_text and speaker_labels or dual_channel set to true
catchyBest for creating video, podcast, or media titlesheadline, or gistpunctuate and format_text set to true


How can I customize the summarization output to fit my specific use case?

You can use the summary_model and summary_type parameters to customize the output. See the tables in the Types and models section for more information on the available options.

Can I have both Auto Chapters and Summarization active in the same request?

No, you can't have both Auto Chapters and Summarization active in the same request. If you enable both models in a single request, you'll receive an error message.

How long does it take to generate a summarization output?

The inference speed of the Summarization model depends on the desired output length. However, a single batch can be processed in less than 1 second.

Can I extract individual words and their corresponding speaker labels with Summarization?

No, Summarization only generates a single abstractive summary of the entire audio file, and doesn't provide word-level information or speaker labels. If you need word-level information, consider using AssemblyAI's Speech Recognition or Speaker Diarization models instead.