Key phrase detection in audio files using Python

Key phrase detection identifies significant words and phrases in your transcript and lets you extract the most important highlights from your audio or video file.

With the AssemblyAI Python SDK it only takes a few lines of Python code to achieve this. Let's learn how to obtain transcripts together with their key phrases step-by-step.

Project Prerequisites

We will use the following dependencies to complete this tutorial:

Python 3.8 or newer
The AssemblyAI Python SDK, version 0.19.0 or greater
An AssemblyAI API key, which can be copied from the AssemblyAI dashboard

All code in this blog post is also available on GitHub under the key phrases guide of the AssemblyAI cookbook repository.

Getting Started

Create a new folder for your project. Then, navigate to your project directory in your terminal and create a new virtual environment:

# Mac/Linux:
python3 -m venv venv
. venv/bin/activate

# Windows:
python -m venv venv
.\venv\Scripts\activate.bat

Install the AssemblyAI Python package:

pip install assemblyai

Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY. You can get a free API key here.

# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>

# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>

Key Phrase Detection Python Code

After installing the dependencies and setting the environment variable, let's write the Python code to handle the transcription and key phrase detection.

Create a new file named keyphrases.py and copy in the following code:

import assemblyai as aai

# If the API key is not set as an environment variable named
# ASSEMBLYAI_API_KEY, you can also set it like this:
# aai.settings.api_key = "YOUR_API_KEY"

# The URL of the audio file. Can also be a path to a local file.
URL = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

# Configuration settings with `auto_highlights` enabled.
config = aai.TranscriptionConfig(auto_highlights=True)

# Create a Transcriber object and start the transcription.
# This calls the API and blocks until the transcription is finished.
transcript = aai.Transcriber().transcribe(URL, config)

# Iterate over all key phrases
for result in transcript.auto_highlights.results:
    print(f"Highlight: '{result.text}', Count: {result.count}, Rank: {result.rank}, Timestamps: {result.timestamps}")

# print(transcript.text)  # Print the full text

The above code imports the assemblyai Python package into the script, using aai as a shorthand reference. Then the script sets the URL variable which needs to be either a publicly-accessible URL of a file or a path to a local file.

The third line of code sets the config parameters with auto_highlights set to True. This enables the Key Phrases model which is responsible for detecting all important highlights in the transcript.

The fourth line of code instantiates the Transcriber object, which is the main class for calling AssemblyAI's transcription service. Note that the Transcriber automatically looks for a value in an environment variable named ASSEMBLYAI_API_KEY and will use that as the API key if one is set. This can be overridden by explicitly setting the API key with aai.settings.api_key = "YOUR_API_KEY" instead.

In the same line, we call the transcribe() function on the Transcriber object and pass in the URL and the config variable with our desired settings. This function executes the transcription API call and blocks program execution until the transcription is finished. Alternatively, you can also set a webhook to obtain the result when it's ready.

The returned variable is an aai.Transcript object that contains the auto_highlights.results attribute. The value of this results attribute is a list of all the key phrases. We iterate over this list, and for each key phrase we print the text of the highlight, together with the corresponding count, rank, and timestamps. Note that you could also print the complete transcribed text by calling print(transcript.text).

Run the Key Phrase Detection Code

Ensure the keyphrases.py file is saved and that your virtual environment is still activated. Navigate to the project directory in a terminal and run the keyphrases.py file with the following command:

python keyphrases.py

Once the script has finished executing, you should see the following highlights printed to your terminal:

Highlight: 'air quality alerts', Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=3978, end=5114)]
Highlight: 'wide ranging air quality consequences', Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=235388, end=238838)]
Highlight: 'more fires', Count: 1, Rank: 0.07, Timestamps: [Timestamp(start=184716, end=185186)]
Highlight: 'more wildfires', Count: 1, Rank: 0.07, Timestamps: [Timestamp(start=231036, end=232354)]
Highlight: 'air pollution', Count: 1, Rank: 0.07, Timestamps: [Timestamp(start=156004, end=156910)]
Highlight: 'weather systems', Count: 3, Rank: 0.07, Timestamps: [Timestamp(start=47344, end=47958), Timestamp(start=205268, end=205818), Timestamp(start=211588, end=213434)]
Highlight: 'high levels', Count: 2, Rank: 0.06, Timestamps: [Timestamp(start=121128, end=121646), Timestamp(start=155412, end=155866)]
Highlight: 'health conditions', Count: 1, Rank: 0.06, Timestamps: [Timestamp(start=152134, end=152666)]
Highlight: 'New York City', Count: 1, Rank: 0.06, Timestamps: [Timestamp(start=125768, end=126274)]
Highlight: 'respiratory conditions', Count: 1, Rank: 0.05, Timestamps: [Timestamp(start=153028, end=153786)]
Highlight: 'New York', Count: 3, Rank: 0.05, Timestamps: [Timestamp(start=125768, end=126034), Timestamp(start=171448, end=171970), Timestamp(start=175944, end=176322)]
Highlight: 'climate change', Count: 3, Rank: 0.05, Timestamps: [Timestamp(start=229548, end=230230), Timestamp(start=244576, end=245162), Timestamp(start=263332, end=263982)]
Highlight: 'heart conditions', Count: 1, Rank: 0.05, Timestamps: [Timestamp(start=153988, end=154506)]
Highlight: 'Smoke', Count: 6, Rank: 0.05, Timestamps: [Timestamp(start=250, end=650), Timestamp(start=49168, end=49398), Timestamp(start=55284, end=55594), Timestamp(start=168888, end=169118), Timestamp(start=215108, end=215386), Timestamp(start=225944, end=226170)]
Highlight: 'air quality warnings', Count: 1, Rank: 0.05, Timestamps: [Timestamp(start=12324, end=13434)]

Sort Highlights by Timestamps

By default, the highlights are sorted by rank, with the highlight with the highest rank being the first.

If you want to sort the highlights by timestamps, you can use the built-in sorted function in Python with the key argument being the start attribute of the highlight's first timestamp.

key_phrases = transcript.auto_highlights.results

key_phrases = sorted(key_phrases, key=lambda x: x.timestamps[0].start)

for result in key_phrases:
    print(f"Highlight: '{result.text}', Count: {result.count}, Rank: {result.rank}, Timestamps: {result.timestamps}")

If you save the file and run python keyphrases.py again, you should get this output:

Highlight: 'Smoke', Count: 6, Rank: 0.05, Timestamps: [Timestamp(start=250, end=650), Timestamp(start=49168, end=49398), Timestamp(start=55284, end=55594), Timestamp(start=168888, end=169118), Timestamp(start=215108, end=215386), Timestamp(start=225944, end=226170)]
Highlight: 'air quality alerts', Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=3978, end=5114)]
Highlight: 'air quality warnings', Count: 1, Rank: 0.05, Timestamps: [Timestamp(start=12324, end=13434)]
...

As you can see, the first timestamp of the text Smoke is at 250 ms, so this is the smallest value and therefore it's the first highlight in the list.