Skip to main content

Generating subtitles for videos

You can export your completed transcripts in SRT or VTT format, which can be used for subtitles and closed captions in videos. Once your transcript status shows as completed, you can make a request to the appropriate endpoint to export your transcript in SRT or VTT format.

In this guide, we'll walk through the process of generating subtitles for videos using the AssemblyAI API.

SRT and VTT Subtitles Formats

How do SRT files work?

SRT (SubRip Text) files are commonly used to store subtitles for videos. The format is plain text, and it contains the timing information for each subtitle along with the subtitle text itself.

Here's a breakdown of how the format works:

  • Each subtitle entry consists of an index number, start time, end time, and text.
  • The index number is a sequential number starting from 1.
  • The start and end times are given in the format hours:minutes:seconds,milliseconds and are separated by -->.
  • The text that follows the timing information is the subtitle text itself, and it may span multiple lines.
  • Entries are separated by a blank line.

How do VTT formats work?

WEBVTT (Web Video Text Tracks), which is a standard format for displaying timed text tracks (such as subtitles or captions) within HTML5 video.

The syntax is similar to SRT but has some differences:

  • The file should begin with the header WEBVTT.
  • Timing is done with a period (.) separating seconds and milliseconds instead of a comma (,).
  • No blank lines are needed between entries.
  • No index numbers are required.
  • This format is supported by many modern browsers and can be used with the HTML5 <track> element to add subtitles to a <video> element.

If you're planning to upload this file to YouTube, you should be able to use it just like an SRT file. YouTube supports various subtitle formats, including WEBVTT.

Get started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

The entire source code of this guide can be viewed here.

Step-by-step instructions

  1. 1

    Create a new file and import the necessary libraries for making an HTTP request.

  2. 2

    Set up the API endpoint and headers. The headers should include your API token.

  3. 3
  4. 4

    Use the upload_url returned by the AssemblyAI API to create a JSON payload containing the audio_url parameter.

  5. 5

    Make a POST request to the AssemblyAI API endpoint with the payload and headers.

  6. 6

    After making the request, you'll receive an ID for the transcription. Use it to poll the API every few seconds to check the status of the transcript job. Once the status is completed, you can retrieve the transcript from the API response.

  7. 7

    Export your complete transcripts in SRT or VTT format, to be plugged into a video player for subtitles and closed captions.

    To get the subtitles, send a GET request to the /v2/transcript/:id/:subtitle_format endpoint. The format is either srt or vtt.

Advanced usage

You can also customize the maximum number of characters per caption using the chars_per_caption URL parameter in your API requests to either the SRT or VTT endpoints. For example, adding ?chars_per_caption=32 to the SRT endpoint URL ensures that each caption has no more than 32 characters.

API/Model Reference


The AssemblyAI produces subtitles as both .srt and .vtt files. These are standard subtitle formats, and can be used with videos both on and off the web. For example, after generating your subtitle file, you can add it to a Mux video using their platform, or you can use ffmpeg to embed it in a local video file. Subtitle formats contain plain text, so you can import these formatted captions to most video editors, or fine-tune them as needed.