Skip to main content

Auto Chapters

The Auto Chapters model summarizes audio data over time into chapters. Chapters makes it easy for users to navigate and find specific information.

Each chapter contains the following:

  • Summary
  • One-line gist
  • Headline
  • Start and end timestamps
Auto Chapters and Summarization

You can only enable one of the Auto Chapters and Summarization models in the same transcription.


Enable Auto Chapters by setting auto_chapters to true in the transcription config. punctuate must be enabled to use Auto Chapters (punctuate is enabled by default).

Example output

250-28840: Smoke from hundreds of wildfires in Canada is triggering air quality alerts across US
29610-280340: High particulate matter in wildfire smoke can lead to serious health problems

API reference


curl \
--header "Authorization: YOUR_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"audio_url": "YOUR_AUDIO_URL",
"auto_chapters": true
auto_chaptersbooleanEnable Auto Chapters.


chaptersarrayAn array of temporally sequential chapters for the audio file.
chapters[i].giststringAn short summary in a few words of the content spoken in the i-th chapter.
chapters[i].headlinestringA single sentence summary of the content spoken during the i-th chapter.
chapters[i].summarystringA one paragraph summary of the content spoken during the i-th chapter.
chapters[i].startnumberThe starting time, in milliseconds, for the i-th chapter.
chapters[i].endnumberThe ending time, in milliseconds, for the i-th chapter.

The response also includes the request parameters used to generate the transcript.

Frequently asked questions

Can I specify the number of chapters to be generated by the Auto Chapters model?

No, the number of chapters generated by the Auto Chapters model isn't configurable by the user. The model automatically segments the audio file into logical chapters as the topic of conversation changes.


Why am I not getting any chapter predictions for my audio file?

One possible reason is that the audio file doesn't contain enough variety in topic or tone for the model to identify separate chapters. Another reason could be due to background noise or low-quality audio interfering with the model's analysis.