Tutorials

Transcribe audio and video files with Python and Universal-1

Learn how to transcribe audio and video files in your Python applications with AssemblyAI's Universal-1 speech recognition model.

Transcribe audio and video files with Python and Universal-1

Our recently announced speech model Universal-1 sets a new standard for automated speech recognition (ASR) accuracy. Universal-1 demonstrates near-human accuracy, even with accented speech, background noise, and difficult phrases like flight numbers and email addresses. The model now is accessible through the same web API as our previous ASR models. 

Along with Universal-1, we’ve also introduced two new pricing tiers: Best and Nano. The Best tier of Universal-1 is designed for the highest accuracy possible. Nano is our new cost-effective tier with support for 99 different languages.

This tutorial will explain how to quickly transcribe audio or video files in Python applications using the Best and Nano tiers with our Speech-to-Text API.

Install the AssemblyAI Python SDK

The easiest way to start transcribing audio is by using one of our official SDKs.

Install the AssemblyAI Python SDK with the following command:

pip install --upgrade assemblyai

Sign up for a new account or log into your existing AssemblyAI account to obtain the API key from your account dashboard, as we will need this API key to authorize our API calls in a Python script.

Transcribe an audio file using Universal-1

To start transcribing an audio file from a URL using Best tier, create a new file named transcribe.py and import the SDK in your Python code: 

import assemblyai as aai

Configure a new authenticated SDK client with the API key found in your account dashboard.

aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()

By default, all transcriptions use the Best tier, so you’ll always get the highest accuracy without any extra configuration.

Continue by specifying either an audio or video file URL, or a local file with the following code:

# you can use an audio file located at a publicly-accessible URL
audio_file = "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3"

# or you can upload a local file directly from your local file system
audio_file = "/Users/matt/Downloads/5_common_sports_injuries.mp3"

# "audio_url" variable is set for a remote URL, or "audio_file" for local file
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print(transcript.text)

On the command line, run the script with the following command:

python transcribe.py

You should now have the results of the transcription performed by Universal-1 printed to your terminal. Use the code to transcribe audio and video files in your Python applications.

Nano—a cost-effective alternative

Switching between Best and Nano requires a tweak to the TranscriptionConfig that can be passed into the Transcriber object. To use Nano, set the speech_model parameter to nano while instantiating the TranscriberConfig object:

config = aai.TranscriptionConfig(speech_model="nano")
transcriber = aai.Transcriber(config)
transcript = transcriber.transcribe(audio_url)

Here is what the completed script with both Best and Nano options looks like:

import assemblyai as aai


aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()

# you can use an audio file located at a publicly-accessible URL
audio_file = "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3"

# this code will run the "Best" tier
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print("Best tier output:")
    print(transcript.text)

# this is how you can run Nano by setting the speech_model parameter
config = aai.TranscriptionConfig(speech_model="nano")
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print("\nNano tier output:")
    print(transcript.text)

When you run the above script you should see output like the following (note that this output is abbreviated after "37th minute"):

Best tier output:
Runner's knee runner's knee is a condition characterized by pain behind or around the kneecap. It is caused by overuse, muscle imbalance and inadequate stretching. Symptoms include pain under or around the kneecap, pain when walking sprained ankle one nil here in the 37th minute...

Nano tier output:
Runner's knee runner's knee is a condition characterized by pain behind or around the kneecap. It is caused by overuse, muscle imbalance and inadequate stretching. Symptoms include pain under or around the kneecap, pain when walking sprained ankle one nil here in the 37th minute...

Best, Nano and More with Audio Intelligence

We just used Universal-1 using both the Best and Nano pricing tiers to transcribe audio.

Next, there are many further features that AssemblyAI offers beyond transcription to explore, such as:

  • Entity detection to automatically identify and categorize key information.
  • Content moderation for detecting inappropriate content in audio files to ensure that your content is safe for all audiences.
  • PII redaction to minimize sensitive information about individuals by automatically identifying and removing it from your transcript.
  • LeMUR for applying Large Language Models (LLMs) to audio data in a single line of code.

You can also learn more about our approach to creating superhuman Speech AI models on our Research page.