How to integrate spoken audio into LlamaIndex.TS using AssemblyAI

Learn how to apply LLMs to speech with AssemblyAI's new integration for LlamaIndex.TS, using TypeScript and Node.js.

JavaScript sample for transcribing an audio file using LlamaIndex.TS

In this tutorial, you'll learn how to create an application that can answer your questions about an audio file, using LlamaIndex.TS and AssemblyAI's new integration with LlamaIndex.

LlamaIndex is a flexible data framework for connecting custom data sources to Large Language Models (LLMs).
With LlamaIndex, you can easily store and index your data, and then use them with LLMs to build applications.
However, LLMs only operate on textual data and do not understand what is said in audio files. That's why we contributed an integration with AssemblyAI to LlamaIndex.TS so developers can transcribe audio files to text in LlamaIndex.

Prerequisites for working with audio data in LlamaIndex

To follow along, you'll need the following:

Sign up for an AssemblyAI account to get an API key for free, or sign into your AssemblyAI account, then grab your API key from the dashboard.

Set up your TypeScript Node.js project

Next, if you don't already have a Node.js application, create a new directory and run npm init.

mkdir transcript-llamaindex
cd transcript-llamaindex
npm init -y

Next, add tsx, a tool to easily execute TypeScript:

npm install --save-dev tsx

Create a file named index.ts and add the following code:

console.log("Hello World!")

Then, run npx tsx index.ts to execute your TypeScript:

npx tsx index.ts

The output should be Hello World!.

Configure environment variables

The application you're building needs your OpenAI API key and AssemblyAI API key. A common way to do this is by storing them into a .env file and loading the .env file at the start of your application.

First, create a new .env file with the following contents:


Then replace <YOUR_OPENAI_API_KEY> with your OpenAI API key, and <YOUR_ASSEMBLYAI_API_KEY> with your AssemblyAI API key.

Then, install the dotenv module and the @types/node module for some TypeScript types that you'll use:

npm install dotenv --save
npm i --save-dev @types/node

Update the index.ts code to load the environment variables and print them to the console:

import 'dotenv/config';

Finally, run the code to see the result:

npx tsx index.ts

You'll see all your environment variables printed out, including the OPENAI_API_KEY and ASSEMBLYAI_API_KEY.


Make sure to never check your API keys and other secrets into source control, by hard-coding or accidentally adding the .env file. Keep those secrets safe and secure!

Add LlamaIndex.TS and create a Q&A chain

First, add LlamaIndex.TS using NPM or your preferred package manager:

npm install llamaindex --save

Next, update the index.ts code with the following question and answers (Q&A) sample:

import "dotenv/config";
import { Document, VectorStoreIndex } from "llamaindex";

async function main() {
  // Transcribe audio and store transcript in documents
  const docs = [new Document({ text: "I'm using LlamaIndex to work with LLMs"})]

  // Split text and create embeddings. Store them in a VectorStoreIndex
  const index = await VectorStoreIndex.fromDocuments(docs);

  // Query the index
  const queryEngine = index.asQueryEngine();
  const response = await queryEngine.query("What LLM framework am I using?");

  // Output response


The code above hardcodes a document to provide context to the LLM. The VectorStoreIndex creates embedding and indexes the hardcoded document. Then the application queries the index with the question "What LLM framework am I using?". The query engine will pull the relevant context from the index, and send the document and question to OpenAI's LLM. The response from the LLM is then printed to the console.

If you run the code using npx tsx index.ts, the output will be something like "You are using the LlamaIndex framework.".

In the upcoming section, you'll use AssemblyAI's transcript reader to transcribe an audio file and put the transcript in a document instead of the hard-coded document above.

Transcribe audio with the LlamaIndex AudioTranscriptReader

The llamaindex module comes with the new readers from AssemblyAI. To use the readers, import them from the module. Update the index.ts file with the code below:

import "dotenv/config";
import { VectorStoreIndex, AudioTranscriptReader } from "llamaindex";

async function main() {
  const reader = new AudioTranscriptReader();
  // Transcribe audio and store transcript in documents
  const docs = await reader.loadData({
    // You can also use a local path to an audio file, like ./sports_injuries.mp3
    audio: "https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3",
    language_code: "en_us",

  // Split text and create embeddings. Store them in a VectorStoreIndex
  const index = await VectorStoreIndex.fromDocuments(docs);

  // Query the index
  const queryEngine = index.asQueryEngine();
  const response = await queryEngine.query("What is a runner's knee?");

  // Output response


The AudioTranscriptReader uses AssemblyAI's API to transcribe the audio file passed to audio. The AudioTranscriptReader.loadData() function creates an array with a single document containing the transcript text.


You can also pass in a local file path to the audio property and the loader will upload the file for you to AssemblyAI's CDN.

Now that the AudioTranscriptReader transcribed the sports injuries audio file, the LLM can give a well-informed answer about the audio.

Run the application again using npx tsx index.ts. The output should look like this:

Runner's knee is a condition characterized by pain behind or around the kneecap. It is caused by overuse muscle imbalance and inadequate stretching. Symptoms include pain under or around the kneecap, pain when walking.


Check out the LlamaIndex.TS integration docs to learn how to use the additional readers provided by AssemblyAI.


In this tutorial, you learned about the new AssemblyAI integration that was added to LlamaIndex.TS.
You created a Q&A application that can answer questions about an audio file, by leveraging AssemblyAI's speech-to-text APIs, OpenAI, and LlamaIndex.TS.