Tutorials

How to Get YouTube Video Transcripts

Learn how to get the transcript of a YouTube video with this easy-to-follow Python tutorial. We'll use yt-dlp to download YouTube videos, and automatically transcribe them with AssemblyAI.

How to Get YouTube Video Transcripts

In this guide, we'll show you how to get YouTube video transcripts automatically using Python. This comes in handy, for example, if you need to take notes from a video lecture. Getting YouTube transcripts is a two-part process. First, we use the yt-dlp library to download YouTube videos and then transcribe them with the AssemblyAI API.

yt-dlp is a fork of the popular youtube-dl library but with additional features and fixes. It is better maintained and preferred over youtube-dl nowadays.

AssemblyAI's API allows us to easily get transcripts for both local and remote audio files. Therefore, to get YouTube transcripts, we must either download the YouTube video locally or extract its publicly accessible URL.

In this guide we’ll show three different approaches:

  • Option 1: Download YouTube videos via the CLI
  • Option 2: Download YouTube videos via Python code
  • Option 3: Extract the audio url without downloading the file

Let’s get started!

Prerequisites to get YouTube transcripts

We will use the following dependencies to complete this tutorial:

All code in this blog post is also available on GitHub under the YouTube transcripts guide of the AssemblyAI cookbook repository.

Project Setup

Make sure that you have Python 3.8 or newer already installed on your system and create a new folder for your project. Then, navigate to your project directory in your terminal and create a new virtual environment:

# Mac/Linux:
python3 -m venv venv
. venv/bin/activate

# Windows:
python -m venv venv
.\venv\Scripts\activate.bat

Install yt-dlp and the AssemblyAI Python package.

pip install -U yt-dlp assemblyai

Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY. You can get a free API key here.

# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>

# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>

Lastly, ffmpeg and ffprobe are required. FFmpeg is an open-source and free software for handling, video, audio, and other multimedia files. We’ll be using this in conjunction with yt-dlp to convert the video we download into an audio file. (Note: The transcription may work without FFmpeg, but the installation is strongly recommended so you get a correct audio file on your machine.)

On Mac, you can easily install it with homebrew (brew install ffmpeg) and on Linux with apt install ffmpeg, but for all operating systems you can also install it by downloading the executables:

  • Visit the official FFmpeg download page and select the packages or executables for your operating system. Then download the latest versions of ffmpeg and ffprobe.
  • Unzip the downloaded file and you’ll see executable files. Move the executable files of ffmpeg and ffprobe to a directory of your choice and make a note of the directory path.
  • Add the directory to your PATH variable. For example, if the FFmpeg binary is in /Users/test/local, type: export PATH=$PATH:/Users/test/local.

Option 1: Download YouTube videos via the CLI

In this approach, we download the YouTube video via the command line and then transcribe it via the AssemblyAI API. We use the following video here:

To download it, use the yt-dlp command with the following options:

  • -f m4a/bestaudio: The format should be the best audio version in m4a format.
  • -o "%(id)s.%(ext)s": The output name should be the id followed by the extension. In this example, the video gets saved to "wtolixa9XTg.m4a".
  • wtolixa9XTg: the id of the video.

This is the complete command to download the video:

yt-dlp -f m4a/bestaudio -o "%(id)s.%(ext)s" wtolixa9XTg

Next, set up the AssemblyAI SDK and transcribe the file. Make sure that the path you pass to the transcribe() function corresponds to the saved filename.

import assemblyai as aai

# If the API key is not set as an environment variable named
# ASSEMBLYAI_API_KEY, you can also set it like this:
# aai.settings.api_key = "YOUR_API_KEY"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe("wtolixa9XTg.m4a")

print(transcript.text)

After the transcription has finished, you can access the transcribed text through the `transcript.text` attribute. That’s how easy it is to transcribe a file with AssemblyAI!

Option 2: Download YouTube videos via Python code

In this approach, we download the video with a Python script instead of the command line.

You can download the file with the following code that utilizes yt-dlp:

import yt_dlp

URLS = ['https://www.youtube.com/watch?v=wtolixa9XTg']

ydl_opts = {
    'format': 'm4a/bestaudio/best',  # The best audio version in m4a format
    'outtmpl': '%(id)s.%(ext)s',  # The output name should be the id followed by the extension
    'postprocessors': [{  # Extract audio using ffmpeg
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'm4a',
    }]
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    error_code = ydl.download(URLS)

After downloading, you can again use the AssemblyAI SDK code from the previous section to transcribe the file.

(Note: If you don’t have FFmpeg installed, simply remove the postprocessors key-value pair from the options. It doesn’t get converted  to a .m4a file correctly then, but the transcription may still work.)

Option 3: Extract the YouTube audio URL without downloading the file

In this approach, we don't download the file at all. Instead, we use yt-dlp to extract a publicly accessible URL of the video that the AssemblyAI API can handle.

import yt_dlp

URL = 'https://www.youtube.com/watch?v=wtolixa9XTg'

with yt_dlp.YoutubeDL() as ydl:
   info = ydl.extract_info(URL, download=False)

Next, iterate over all formats. The formats are already sorted from worst to best, so we iterate in reverse and stop at the first .m4a file. This is the format with the best audio version.

The URL to the hosted file can then be accessed with the `url` key.

for format in info["formats"][::-1]:
    if format["resolution"] == "audio only" and format["ext"] == "m4a":
        url = format["url"]
        break
        
print(url)

Now, use the same AssemblyAI SDK code again to transcribe the file. The transcribe() function can handle both a path to a local file or a publicly accessible URL like in this case. Hence, the code is the same as in the other approaches:

import assemblyai as aai

# If the API key is not set as an environment variable named
# ASSEMBLYAI_API_KEY, you can also set it like this:
# aai.settings.api_key = "YOUR_API_KEY"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(url)

print(transcript.text)

Wrapping Up

In this tutorial, you’ve learned how to download YouTube videos with yt-dlp and transcribe YouTube videos with AssemblyAI using Python. You’ve learned three approaches that let you download videos via the CLI, download videos via a Python script, or only extract the audio URL with a script so you can use it for transcription.

To learn how to use other features of AssemblyAI’s API, check out some of our other blogs:

Alternatively, check out other content on our blog or YouTube channel to learn more about AI, or feel free to join us on Twitter or Discord to stay in the loop when we release new content.