How to Create VTT Files for Videos in Python

VTT files are widely used subtitle file formats for videos. In this guide, we'll show you how to create VTT(.vtt) files for videos in Python.

What is a VTT file?

VTT files are text files saved in Video Text Tracks format, also known as WebVTT. A VTT file is generally saved with the .vtt extension and contains supplementary information about the video such as subtitles or captions. VTT is one of the most common file formats used for video subtitles.

The syntax is similar to SRT files but has some differences:

The file should begin with the header WEBVTT.
The start and end times are given in the format hours:minutes:seconds.milliseconds and are separated by -->
No blank lines are needed between entries.
No index numbers are required.
This format is supported by many modern browsers and can be used with the HTML5 <track> element to add subtitles to a <video> element.

VTT files make it possible to add subtitles to video content after it is produced. For example, they can be uploaded to YouTube videos to add missing subtitles or replace existing ones with higher-quality ones.

Example of a VTT file

This is what the first lines of the SRT file for this YouTube video look like:

WEBVTT

00:00.170 --> 00:04.234
AssemblyAI is building AI systems to help you build AI applications

00:04.282 --> 00:08.106
with spoken data. We create superhuman AI models for speech

Prerequisites to get VTT files in Python

We will use the following dependencies to complete this tutorial:

The AssemblyAI Python SDK
A free AssemblyAI API key, which can be copied from your AssemblyAI dashboard

All code in this blog post is also available on GitHub under the subtitle generation guide of the AssemblyAI cookbook repository.

If you want a working code example that transcribes and generates subtitles for YouTube videos, you can check out this Google Colab.

Project Setup for VTT generation

Make sure that you have Python 3.8 or newer already installed on your system and create a new folder for your project. Then, navigate to your project directory in your terminal and create a new virtual environment:

# Mac/Linux:
python3 -m venv venv
. venv/bin/activate

# Windows:
python -m venv venv
.\venv\Scripts\activate.bat

Install the AssemblyAI Python package.

pip install assemblyai

Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY. You can get a free API key here.

# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>

# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>

Create the VTT files for videos in Python

AssemblyAI can produce subtitles as both SRT and VTT files. We also have a guide on how to generate SRT files for videos.

First, you'll need the video file for which you want to create the VTT file. You can either use a path to a local file or a URL to a publicly accessible file. The AssemblyAI API supports most common audio and video file formats, so you can submit both audio or video files to generate VTT files. You’ll find all supported file formats in our API documentation.

Create a new file named main.py and insert the following code:

import assemblyai as aai

# If the API key is not set as an environment variable named
# ASSEMBLYAI_API_KEY, you can also set it like this:
# aai.settings.api_key = "YOUR_API_KEY"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe("https://storage.googleapis.com/aai-web-samples/aai-overview.mp4")

vtt = transcript.export_subtitles_vtt()

# Save it to a file
with open("subtitle_example.vtt", "w") as f:
    f.write(vtt)

The above code first imports the assemblyai Python package. Next, it instantiates the Transcriber object, which is used to call AssemblyAI's transcription service.

Calling transcriber.transcribe() starts the transcription process on the specified video file. Here, we used a remote URL, but you can replace it with the path to your own file.

When the transcription is finished, it gets saved in the transcript object. Calling transcript.export_subtitles_vtt() then generates the subtitles in VTT format.

Lastly, the script dumps the VTT string into a .vtt file. You can modify the output filename to your liking.

Running the subtitle generation code

Ensure that the main.py file is saved, that your virtual environment is still activated, and that your API key is set. Navigate to the project directory in a terminal and run the `main.py` file with the following command:

python main.py

Once the script has finished executing, you should see a new .vtt file with the generated subtitles in your folder.

Specify the number of characters per caption in the VTT file

You can also customize the maximum number of characters per caption by specifying the chars_per_caption parameter. For example:

vtt = transcript.export_subtitles_vtt(chars_per_caption=32)

The captions are then limited to 32 characters:

WEBVTT

00:00.170 --> 00:01.754
AssemblyAI is building AI

00:01.802 --> 00:03.514
systems to help you build AI

Wrapping Up

In this tutorial, you’ve learned how to generate VTT files for videos with AssemblyAI using Python.

Here are a few other helpful resources to learn more about what you can do with transcripts and AssemblyAI’s Speech AI models:

Alternatively, check out other content on our blog or YouTube channel to learn more about AI, or feel free to join us on Twitter or Discord to stay in the loop when we release new content.