Transcribe and generate subtitles for YouTube videos with Node.js
Learn how to transcribe YouTube videos and generate SRT subtitles with Node.js and AssemblyAI in this easy-to-follow guide.



Programmatically transcribing YouTube videos with Node.js lets you build scalable workflows for searchable data, subtitles, and LLM analysis.
This guide shows you how to extract YouTube audio, transcribe it with AssemblyAI's Voice AI models, and generate subtitles—with production-ready error handling.
How YouTube transcription works with Node.js
YouTube transcription requires extracting the audio stream using yt-dlp, then sending it to AssemblyAI's speech-to-text API for processing.
Prerequisites:
- Node.js 18+
- Python 3.7+
- Free AssemblyAI account
Step 1: Set up your development environment
Install Node.js 18+ and initialize your project:
mkdir transcribe-youtube-video
cd transcribe-youtube-video
npm init -y
Open the package.json file and add type: "module", to the list of properties.
{
...
"type": "module",
...
}
This enables ES Module syntax for importing and exporting modules.
Install dependencies:
assemblyai- JavaScript SDK for API interactionyoutube-dl-exec- yt-dlp wrapper for YouTube audio extractiontsx- TypeScript execution without setup
npm install --save assemblyai youtube-dl-exec tsx
Install Python 3.7+ (required by youtube-dl-exec).
Get your AssemblyAI API key from the dashboard or sign up for free:
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<your_key>
# Windows:
set ASSEMBLYAI_API_KEY=<your_key>
You can find the full source code of this application in this GitHub repository.
Step 2: Retrieve the audio of a YouTube video
To transcribe a video with AssemblyAI, you either need a public URL to the video file or upload the video file to AssemblyAI. You only need the audio track of a video to generate a transcript, so you can use a public URL to the audio track or upload the audio to AssemblyAI.
YouTube stores the audio and video of a YouTube video in separate files, which you can retrieve in different formats and quality. The easiest way to retrieve the formats is using the yt-dlp CLI tool. The youtube-dl-exec module wraps yt-dlp so you can retrieve this information from Node.js.
Create a file called index.ts and add the following code:
import { youtubeDl } from "youtube-dl-exec";
const youtubeVideoUrl = "https://www.youtube.com/watch?v=wtolixa9XTg";
console.log("Retrieving audio URL from YouTube video");
const videoInfo = await youtubeDl(youtubeVideoUrl, {
dumpSingleJson: true,
preferFreeFormats: true,
addHeader: ["referer:youtube.com", "user-agent:googlebot"],
});
const audioUrl = videoInfo.formats.reverse().find(
(format) => format.resolution === "audio only" && format.ext === "m4a",
)?.url;
if (!audioUrl) {
throw new Error("No audio only format found");
}
console.log("Audio URL retrieved successfully");
console.log("Audio URL:", audioUrl);
This script extracts the optimal audio stream:
videoInfocontains all available formats- Formats are ordered worst to best quality
- Script finds the highest quality M4A audio stream
Step 3: Transcribe YouTube videos with AssemblyAI
With the audio URL, you can now send the file to AssemblyAI for transcription. Our AI models will process the audio and return a highly accurate transcript.
First, import the AssemblyAI client at the top of your file:
import { AssemblyAI } from 'assemblyai';
Next, initialize the client with your API key and call the transcribe method. We'll wrap this in a try...catch block to prepare for error handling, which we'll cover in a later step.
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
try {
console.log('Transcribing audio...');
const transcript = await client.transcripts.transcribe({ audio: audioUrl });
if (transcript.status === 'error') {
console.error('Transcription failed:', transcript.error);
return;
}
console.log('Transcription complete!');
// The transcript text is available in transcript.text
} catch (error) {
console.error('An error occurred during transcription:', error);
}
Step 4: Save the transcript and subtitles
Now that you have a transcript, you can save the transcript text to a file. Add the following import which you'll need to save files to disk.
import { writeFile } from "fs/promises"
Then add the following code to save the transcript to disk.
console.log("Saving transcript to file");
await writeFile("./transcript.txt", transcript.text!);
console.log("Transcript saved to file transcript.txt");
You can also generate SRT subtitles from the transcript and save it to disk like this:
console.log("Retrieving transcript as SRT subtitles");
const subtitles = await transcript.subtitles("srt");
await writeFile("./subtitles.srt", subtitles);
console.log("Subtitles saved to file subtitles.srt");
WebVTT Subtitle Format
WebVTT file or Web Video Text to Track File is another widely supported and popular subtitle format. To generate WebVTT, replace "srt" with "vtt", and save the file with the vtt-extension.
Step 5: Handle errors and edge cases
Handle common failure scenarios: private videos, network issues, and API errors.
Expand the try...catch block:
try {
// Code to retrieve audio URL...
// Code to transcribe...
} catch (error) {
console.error('An unexpected error occurred:', error);
}
Check for invalid URLs or issues from yt-dlp before attempting to transcribe. If audioUrl is null, a suitable audio format wasn't found—stop execution to avoid sending a bad request to the AssemblyAI API.
Performance considerations for large videos
Long videos require asynchronous processing:
- API returns transcript ID immediately
- Job processes in background
- Configure webhooks instead of polling
- More efficient for production systems
You can also balance speed and cost by choosing an appropriate audio quality. While the highest quality audio stream often yields the best results, a lower-bitrate stream might be sufficient for your use case and process faster.
Step 6: Run the script
To run the script, go back to your shell and run:
npx tsx index.ts
After a little while you'll see the transcript text and subtitles appear on your disk. This will take longer if the YouTube video is longer.
Bonus: Summarize a YouTube video with the LLM Gateway
AssemblyAI makes it very easy to build generative AI features on your transcripts using our LLM Gateway.
You can write a prompt to instruct an LLM on what to do with a given transcript, and the model will generate a response.
For example, you can write a prompt that tells the LLM to summarize the video using bullet points.
console.log("Summarizing the video with the LLM Gateway...");
const prompt = "Summarize this video using bullet points";
const llmResponse = await client.chat.completions.create({
model: "claude-sonnet-4-5-20250929",
messages: [
{ role: "user", content: `${prompt}\n\nTranscript: ${transcript.text}` }
],
max_tokens: 1000
});
console.log(prompt + ": " + llmResponse.choices[0].message.content);
You can find the various supported models listed in the LLM Gateway documentation.
If you add this code and run the script again, you'll get a generated summary that looks like this:
Here is a bullet point summary of the key points from the video:
- Lay the math foundation with Khan Academy courses on basics like linear algebra, calculus, statistics etc. Come back later to fill gaps.
- Learn Python - do a beginner and intermediate level course to get a solid base. Python skills are essential.
- Learn key machine learning Python libraries like NumPy, Pandas, Matplotlib. Follow a crash course for each.
- Do Andrew Ng's machine learning specialization course on Coursera. Recently updated to include Python and libraries like NumPy, Scikit-learn, TensorFlow.
- Implement some algorithms from scratch in Python to better understand concepts. An updated ML from scratch course will be released.
- Do Kaggle's intro and intermediate ML courses to learn more data preparation with Pandas.
- Practice on Kaggle with competitions and datasets. Helps build portfolio and CV. Focus on learning over winning.
- Specialize as per industry requirements in CV, NLP etc. Look at job descriptions. Consider learning MLOps.
- Start a blog to write tutorials and share your projects. Helps cement knowledge and build CV.
- Useful books referenced: Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow, Machine Learning Yearning by Andrew Ng.
Building production-ready YouTube transcription
You've now built a functional Node.js script to transcribe YouTube videos and generate subtitles. This is a great foundation for building more advanced applications.
Companies like Veed, Searchie, and Podchaser rely on AssemblyAI's core transcription engine to power features for their users. To move from this script to a production-grade application, focus on building a robust job queue, managing API keys securely, and creating a user interface around this core functionality.
With our reliable infrastructure handling the complexities of Voice AI, your team can focus on building the unique features that deliver value to your customers. To see what's possible, try our API for free and explore our documentation for advanced features like Speaker Diarization, Sentiment Analysis, and more.
Frequently asked questions about YouTube transcription with Node.js
How do I handle private or region-locked YouTube videos?
Pass authentication cookies to yt-dlp options, following YouTube's terms of service.
What's the best audio format for transcription accuracy?
M4A provides optimal balance of quality and file size for accurate transcription.
How can I process multiple YouTube videos concurrently?
Use Promise.all() with wrapped transcription functions—AssemblyAI supports unlimited concurrency.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.




.png)
