October 28, 2025

Transcribe and generate subtitles for YouTube videos with Node.js

Learn how to transcribe YouTube videos and generate SRT subtitles with Node.js and AssemblyAI in this easy-to-follow guide.

Tutorial

Subtitles

JavaScript

Niels Swimberghe

Reviewed by

No items found.

Table of contents

[Visible on live site]

Programmatically transcribing YouTube videos with Node.js lets you build scalable workflows for searchable data, subtitles, and LLM analysis.

This guide shows you how to extract YouTube audio, transcribe it with AssemblyAI's Voice AI models, and generate subtitles—with production-ready error handling.

How YouTube transcription works with Node.js

YouTube transcription requires extracting the audio stream using yt-dlp, then sending it to AssemblyAI's speech-to-text API for processing.

Prerequisites:

Node.js 18+
Python 3.7+
Free AssemblyAI account

Step 1: Set up your development environment

Install Node.js 18+ and initialize your project:

mkdir transcribe-youtube-video cd transcribe-youtube-video npm init -y

Open the package.json file and add type: "module", to the list of properties.

{ ... "type": "module", ... }

This enables ES Module syntax for importing and exporting modules.

Install dependencies:

assemblyai - JavaScript SDK for API interaction
youtube-dl-exec - yt-dlp wrapper for YouTube audio extraction
tsx - TypeScript execution without setup

npm install --save assemblyai youtube-dl-exec tsx

Install Python 3.7+ (required by youtube-dl-exec).

Get your AssemblyAI API key from the dashboard or sign up for free:

# Mac/Linux: export ASSEMBLYAI_API_KEY=<your_key> # Windows: set ASSEMBLYAI_API_KEY=<your_key>

You can find the full source code of this application in this GitHub repository.

Step 2: Retrieve the audio of a YouTube video

To transcribe a video with AssemblyAI, you either need a public URL to the video file or upload the video file to AssemblyAI. You only need the audio track of a video to generate a transcript, so you can use a public URL to the audio track or upload the audio to AssemblyAI.

YouTube stores the audio and video of a YouTube video in separate files, which you can retrieve in different formats and quality. The easiest way to retrieve the formats is using the yt-dlp CLI tool. The youtube-dl-exec module wraps yt-dlp so you can retrieve this information from Node.js.

Create a file called index.ts and add the following code:

import { youtubeDl } from "youtube-dl-exec"; const youtubeVideoUrl = "https://www.youtube.com/watch?v=wtolixa9XTg"; console.log("Retrieving audio URL from YouTube video"); const videoInfo = await youtubeDl(youtubeVideoUrl, { dumpSingleJson: true, preferFreeFormats: true, addHeader: ["referer:youtube.com", "user-agent:googlebot"], }); const audioUrl = videoInfo.formats.reverse().find( (format) => format.resolution === "audio only" && format.ext === "m4a", )?.url; if (!audioUrl) { throw new Error("No audio only format found"); } console.log("Audio URL retrieved successfully"); console.log("Audio URL:", audioUrl);

This script extracts the optimal audio stream:

videoInfo contains all available formats
Formats are ordered worst to best quality
Script finds the highest quality M4A audio stream

Step 3: Transcribe YouTube videos with AssemblyAI

With the audio URL, you can now send the file to AssemblyAI for transcription. Our AI models will process the audio and return a highly accurate transcript.

First, import the AssemblyAI client at the top of your file:

import { AssemblyAI } from 'assemblyai';

Next, initialize the client with your API key and call the transcribe method. We'll wrap this in a try...catch block to prepare for error handling, which we'll cover in a later step.

Step 4: Save the transcript and subtitles

Now that you have a transcript, you can save the transcript text to a file. Add the following import which you'll need to save files to disk.

import { writeFile } from "fs/promises"

Then add the following code to save the transcript to disk.

console.log("Saving transcript to file"); await writeFile("./transcript.txt", transcript.text!); console.log("Transcript saved to file transcript.txt");

You can also generate SRT subtitles from the transcript and save it to disk like this:

console.log("Retrieving transcript as SRT subtitles"); const subtitles = await transcript.subtitles("srt"); await writeFile("./subtitles.srt", subtitles); console.log("Subtitles saved to file subtitles.srt");

WebVTT Subtitle Format

WebVTT file or Web Video Text to Track File is another widely supported and popular subtitle format. To generate WebVTT, replace "srt" with "vtt", and save the file with the vtt-extension.

Step 5: Handle errors and edge cases

Handle common failure scenarios: private videos, network issues, and API errors.

Expand the try...catch block:

try { // Code to retrieve audio URL... // Code to transcribe... } catch (error) { console.error('An unexpected error occurred:', error); }

Check for invalid URLs or issues from yt-dlp before attempting to transcribe. If audioUrl is null, a suitable audio format wasn't found—stop execution to avoid sending a bad request to the AssemblyAI API.

Performance considerations for large videos

Long videos require asynchronous processing:

API returns transcript ID immediately
Job processes in background
Configure webhooks instead of polling
More efficient for production systems

You can also balance speed and cost by choosing an appropriate audio quality. While the highest quality audio stream often yields the best results, a lower-bitrate stream might be sufficient for your use case and process faster.

Step 6: Run the script

To run the script, go back to your shell and run:

npx tsx index.ts

After a little while you'll see the transcript text and subtitles appear on your disk. This will take longer if the YouTube video is longer.

Bonus: Summarize a YouTube video with the LLM Gateway

AssemblyAI makes it very easy to build generative AI features on your transcripts using our LLM Gateway.
You can write a prompt to instruct an LLM on what to do with a given transcript, and the model will generate a response.
For example, you can write a prompt that tells the LLM to summarize the video using bullet points.

console.log("Summarizing the video with the LLM Gateway..."); const prompt = "Summarize this video using bullet points"; const llmResponse = await client.chat.completions.create({ model: "claude-sonnet-4-5-20250929", messages: [ { role: "user", content: `${prompt}\n\nTranscript: ${transcript.text}` } ], max_tokens: 1000 }); console.log(prompt + ": " + llmResponse.choices[0].message.content);

You can find the various supported models listed in the LLM Gateway documentation.

If you add this code and run the script again, you'll get a generated summary that looks like this:

Here is a bullet point summary of the key points from the video: - Lay the math foundation with Khan Academy courses on basics like linear algebra, calculus, statistics etc. Come back later to fill gaps. - Learn Python - do a beginner and intermediate level course to get a solid base. Python skills are essential. - Learn key machine learning Python libraries like NumPy, Pandas, Matplotlib. Follow a crash course for each. - Do Andrew Ng's machine learning specialization course on Coursera. Recently updated to include Python and libraries like NumPy, Scikit-learn, TensorFlow. - Implement some algorithms from scratch in Python to better understand concepts. An updated ML from scratch course will be released. - Do Kaggle's intro and intermediate ML courses to learn more data preparation with Pandas. - Practice on Kaggle with competitions and datasets. Helps build portfolio and CV. Focus on learning over winning. - Specialize as per industry requirements in CV, NLP etc. Look at job descriptions. Consider learning MLOps. - Start a blog to write tutorials and share your projects. Helps cement knowledge and build CV. - Useful books referenced: Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow, Machine Learning Yearning by Andrew Ng.

Building production-ready YouTube transcription

You've now built a functional Node.js script to transcribe YouTube videos and generate subtitles. This is a great foundation for building more advanced applications.

Companies like Veed, Searchie, and Podchaser rely on AssemblyAI's core transcription engine to power features for their users. To move from this script to a production-grade application, focus on building a robust job queue, managing API keys securely, and creating a user interface around this core functionality.

With our reliable infrastructure handling the complexities of Voice AI, your team can focus on building the unique features that deliver value to your customers. To see what's possible, try our API for free and explore our documentation for advanced features like Speaker Diarization, Sentiment Analysis, and more.

Frequently asked questions about YouTube transcription with Node.js

How do I handle private or region-locked YouTube videos?

Pass authentication cookies to yt-dlp options, following YouTube's terms of service.

What's the best audio format for transcription accuracy?

M4A provides optimal balance of quality and file size for accurate transcription.

How can I process multiple YouTube videos concurrently?

Use Promise.all() with wrapped transcription functions—AssemblyAI supports unlimited concurrency.

‍

Transcribe and generate subtitles for YouTube videos with Node.js

How YouTube transcription works with Node.js

Step 1: Set up your development environment

Step 2: Retrieve the audio of a YouTube video

Step 3: Transcribe YouTube videos with AssemblyAI

Step 4: Save the transcript and subtitles

WebVTT Subtitle Format

Step 5: Handle errors and edge cases

Performance considerations for large videos

Step 6: Run the script

Bonus: Summarize a YouTube video with the LLM Gateway

Building production-ready YouTube transcription

Frequently asked questions about YouTube transcription with Node.js

How do I handle private or region-locked YouTube videos?

What's the best audio format for transcription accuracy?

How can I process multiple YouTube videos concurrently?

Using multichannel and speaker diarization

How to use Google's Speech-to-Text API to transcribe audio in Python

Python Speech-to-Text with Punctuation, Casing, and Formatting

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

AssemblyAI's October 2025 releases: Multilingual streaming, guardrails, and LLM gateway

AI-powered meeting company Supernormal launches customizable Voice Agents

Getting Started With Torchaudio

New: Improved Topic Detection and IAB Classification

Transcribe and generate subtitles for YouTube videos with Node.js

How YouTube transcription works with Node.js

Step 1: Set up your development environment

Step 2: Retrieve the audio of a YouTube video

Step 3: Transcribe YouTube videos with AssemblyAI

Step 4: Save the transcript and subtitles

WebVTT Subtitle Format

Step 5: Handle errors and edge cases

Performance considerations for large videos

Step 6: Run the script

Bonus: Summarize a YouTube video with the LLM Gateway

Building production-ready YouTube transcription

Frequently asked questions about YouTube transcription with Node.js

How do I handle private or region-locked YouTube videos?

What's the best audio format for transcription accuracy?

How can I process multiple YouTube videos concurrently?

Related posts

Using multichannel and speaker diarization

How to use Google's Speech-to-Text API to transcribe audio in Python

Python Speech-to-Text with Punctuation, Casing, and Formatting

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

AssemblyAI's October 2025 releases: Multilingual streaming, guardrails, and LLM gateway

AI-powered meeting company Supernormal launches customizable Voice Agents

Getting Started With Torchaudio

New: Improved Topic Detection and IAB Classification