Skip to main content

Identifying highlights in audio and video files

The Key Phrases model identifies significant words and phrases in your transcript and lets you to extract the most important concepts or highlights from your audio or video file.

For example, if you're a call center, you can analyze highlights from recorded phone calls.

In this step-by-step guide, you'll learn how to apply the model. You'll send the auto_highlights parameter in your request, and then use the auto_highlights_result property in the response.

Get started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

The complete source code for this guide can be viewed here.

Here's an audio sample for this guide:

https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3

Step-by-step instructions

  1. 1

    Create a new file and import the necessary libraries for making an HTTP request.

  2. 2

    Set up the API endpoint and headers. The headers should include your API key.

  3. 3

    Upload your local file to the AssemblyAI API.

  4. 4

    Use the upload_url returned by the AssemblyAI API to create a JSON payload containing the audio_url parameter and the auto_highlights parameter set to True.

  5. 5

    Make a POST request to the AssemblyAI API endpoint with the payload and headers.

  6. 6

    After making the request, you'll receive an ID for the transcription. Use it to poll the API every few seconds to check the status of the transcript job. Once the status is completed, you can retrieve the transcript from the API response, as well as the auto highlight results.

Understanding the response

The auto_highlights_result key in the response contains a list of all the highlights found in the transcription text. For each entry, the results include the text of the phrase or word detected (text), how many times it occurred in the text (count), its relevancy score (rank), and a list of all the timestamps (timestamps), in milliseconds, in the audio where the phrase or word is spoken.

For more information about the API response, see API/Model reference.

Conclusion

Automatically highlighting relevant phrases in calls is a great way to focus on important information at a glance. In general, adding AI to Conversation Intelligence tools can augment them by generating actionable summaries to speed up call review, generating insights, monitoring for concerns, increasing engagement, and more. Our AI summarization model has several customizable parameters that you can experiment with for other types of recordings.

To learn more about how to use AI summarization for call coaching, see AssemblyAI blog.