November 11, 2021

Introducing Auto Chapters - Summarize Audio and Video Files

Today, we're excited to officially announce our newest feature at AssemblyAI - Auto Chapters. The Auto Chapters feature provides a "summary over time" for audio content transcribed with AssemblyAI's API.

Auto Chapters

Dylan Fox

Founder, CEO

Dylan Fox

Founder, CEO

Table of contents

[Visible on live site]

Get $50 in credits

Today, we're excited to officially announce our newest feature at AssemblyAI - Auto Chapters.

The Auto Chapters, or Text Summarization, feature provides a "summary over time" for audio content transcribed with AssemblyAI's Speech-to-Text API. It works by first breaking audio/video files into logical "chapters" as the topic of conversation changes, and then provides an automatically generated summary for each "chapter" of content.

In the above graphic, we demonstrate the "chapters" that were extracted, and the summaries that were generated, from Joe Biden's State of the Union address. For each "chapter" that is detected, the API returns a JSON schema like the below:

chapters: [ { "start": 0, "end": 20000, "summary": "The American job plan is going to create millions of good paying jobs. jobs created in an American jobs plan do not require a College degree. 75% don't require an associate's degree.", "headline": "The American job plan is going to create millions of good paying jobs.", } ... ]

As you can see above, the API responds with the start and end timestamps (in milliseconds) for each "chapter" that was detected, a summary which is a few sentence summary of the content spoken during that timeframe, and a short headline which can be thought of as a "summary of the summary".

#Auto Chapters In Action

Below is the entire 1 hour and 43 minute State of the Union address that Biden gave to Congress on April 28, 2021, and the "chapters" that were detected by the AssemblyAI API, along with their summaries.

1:45: I have the high privilege and distinct honor to present to you the President ofthe United States. 31:42: 90%of Americans now live within 5 miles ofa vaccination site. 44:28: The American job planis going to create millions ofgood paying jobs. 47:59: No one working 40 hours a week should live below the poverty line. 48:22: American jobsfinally be the biggest increase in non defense research anddevelopment. 49:21: The National Instituteof Health, the NIH, should create a similar advanced research Projects agency forHealth. 50:31: It would have a singular purposeto develop breakthroughs to prevent, detect and treat diseases like Alzheimer's, diabetes and cancer.51:29: I wantedtolay out before the Congress my plan. 52:19:When this nation made twelve years of public education universal in the last century, it made us the best educated, best prepared nation inthe world. 54:25: The American Family's Plan guarantees four additional years of public education for every person in America, starting as early as we can.57:08: American Family's Plan will provide access to quality, affordable childcare.61:58: I willnot impose any tax increase onpeople making less than $400,000. 67:34: He said the U.S. will become an Arsenalfor vaccines forother countries. 74:12: After 20 yearsof value, Valor and sacrifice, it's time to bring those troops home.76:01: We haveto come together to heal the soul ofthis nation. 80:02: Gun violence has become an epidemicinAmerica. 84:23:If you believe we need tosecure the border, pass it. 85:00: Congress needsto pass legislation this year to finally secure protection fordreamers. 87:02:If we want to restore the soul of America, we need to protect the right to vote.

#How Auto Chapters Works

Behind the Auto Chapters feature is a set of powerful Machine Learning models. The first model is able to segment an audio file into "chapters" (ie, detect when the topic changes), and the second model summarizes those chapters into bite-sized summaries.

#Use Cases

Below are just some of the use cases our customers are already using the Auto Chapters feature for:

Video Platforms - Automatically create "video chapters" to make videos easier for users to click around, and to jump to the content they're looking for.
Podcast Players - Extract interesting segments of a podcast episode, and make podcast episodes more searchable so users can jump to key parts of an episode to "sample" an episode before listening to the entire thing.
Virtual Meeting Platforms - Offer summaries of the key parts of a meeting, and make meeting recordings easier to consume after the fact.
Telephony - Make phone calls easier to navigate, especially when doing QA within contact centers.

#Using the Auto Chapters Feature

When requesting a transcription with the AssemblyAI API, simply include the auto_chapters: true parameter in your POST requests. For example, in cURL:

curl --request POST \ --url https://api.assemblyai.com/v2/transcript \ --header 'authorization: YOUR-API-TOKEN' \ --header 'content-type: application/json' \ --data '{"audio_url": "https://foo.bar/7510.mp3", "auto_chapters": true}'

When your transcription is completed, you'll see a chapters key in the JSON response, like below:

{ "audio_duration": 12.0960090702948, "audio_url": "https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/8-7-2018-post/7510.mp3", "confidence": 0.956, "id": "5551722-f677-48a6-9287-39c0aafd9ac1", "status": "completed", "text": "The American job plan ...", # auto chapter results can be found in the JSON result here chapters: [ { "start": 0, "end": 20000, "summary": "The American job plan is going to create millions of good paying jobs. jobs created in an American jobs plan do not require a College degree. 75% don't require an associate's degree.", "headline": "The American job plan is going to create millions of good paying jobs.", } ... ] "words": [ { "confidence": 1.0, "end": 440, "start": 0, "text": "You" }, ... ] }

Isolating the chapters key for a moment, we can drill into the JSON response here:

For each chapter that was detected, the API will include with the start and end timestamps (in milliseconds), a summary - which is a few sentence summary of the content spoken during that timeframe - and a short headline, which can be thought of as a "summary of the summary".