With the progressive development of fast and accurate Automatic Speech Recognition technologies, the demand for automatic transcriptions is higher than it has ever been before. Whether being used for accessibility reasons, like automatically generating subtitles for YouTube videos, or for convenience reasons, like automatically transcribing lectures/meetings, Automatic Speech Recognition finds use-cases at all levels across all industries.
In this tutorial, we will learn how to create a JavaScript Audio Transcript application powered by AssemblyAI’s Speech-to-Text API. The code in this tutorial is available on both Github and Replit.
Dependencies
- Node.js ^16.14.2
- Axios ^0.26.1
- AssemblyAI API Key
What Are We Building?
We will be using Node.js and Axios to build a simple JavaScript Audio Transcript app. When provided with a URL that specifies the location of an audio file, the app will transcribe the audio to text and store the result inside of a variable as a string.
The final result will look like this:

Step 1 - Creating A New App
We will only need to write one JavaScript file to create our application. Before we start coding, we'll need to install two packages - Node.js and Axios. If Node.js is not installed, download and install it from the Node.js website, leaving everything on default during the installation. We'll install Axios momentarily.
To keep things organized, create a new folder called javascript-audio-transcript-aai
and navigate into it:
mkdir javascript-audio-transcript-aai
cd javascript-audio-transcript-aai
Now create a JavaScript file called audio.js
and install Axios:
touch audio.js
npm install axios
This process automatically creates a package.json
file inside of the project folder, which now should look like this:

Now that the application folder is properly set up, we can move on to setting up credential authentication for AssemblyAI's API.
Step 2 - Setting Up Authentication
AssemblyAI’s free Speech-to-Text API is what will actually generate the transcripts of our audio files. To use the API, we will require an API key. An API key is sent along with an API request to ensure that the account making the request is authorized to use the API.
Create an AssemblyAI account and go to the AssemblyAI dashboard. On the dashboard, under Your API Key, copy the value of the API key. Note that the value of this key should be kept secret and is uniquely associated to your account.

Next, in the audio.js
file, write the following code to handle authentication with AssemblyAI, replacing the value of “ASSEMBLY-API-KEY”
with the value of the key just copied from the dashboard.
const axios = require("axios")
const audioURL = "https://bit.ly/3yxKEIY"
const APIKey = "ASSEMBLYAI-API-KEY"
const assembly = axios.create({
baseURL: "https://api.assemblyai.com/v2",
headers: {
authorization: APIKey,
"content-type": "application/json",
},
})
Code Breakdown
const axios = require("axios")
imports Axios.const audioURL = "https://bit.ly/3yxKEIY"
creates theaudioURL
variable that links to the audio file we use for this example.const APIKey = “ASSEMBLYAI-API-KEY”
holds the value of your AssemblyAI API Key. Make sure to replace it with the actual API Key value from the AssemblyAI dashboard.const assembly = axios.create({...
handles authentication for every request that is sent to AssemblyAI. That is, it tells AssemblyAI that we are authorized to access its APIs. This variable holds thebaseURL
for the transcription API endpoint and the value of theAPIKey
. It also sets thecontent-type
toapplication/json
. We create this variable so that we don’t need to write a lot of repetitive code for each of our requests.
Testing Authentication
Now that we have API authentication set up, we can test if we are able to authenticate with AssemblyAI and actually get a response.
Add this line of code at the bottom of audio.js
under the existing code:
...
assembly
.post("/transcript", {
audio_url: audioURL
})
.then((res) => console.log(res.data))
.catch((err) => console.error(err));
This code is using Axios to send a POST request using our audioURL
to the AssemblyAI Speech-to-Text API and returns an object that we log in the console.
Open a terminal and navigate to the project folder, and then run the JavaScript file with the following code:
node audio.js
The output in the console is the response of the POST request and should look similar to this:
{
id: 'o49rt8v5ea-bfae-4b39-b276-9eb69d172ade',
language_model: 'assemblyai_default',
acoustic_model: 'assemblyai_default',
language_code: 'en_us',
status: 'queued',
audio_url: 'https://bit.ly/3yxKEIY',
text: null,
words: null,
utterances: null,
...
}
In this object, we can see a couple of elements that we will need:
id
holds the transcript id. We need this id to check the status of our transcript.status
tells us the current status of our transcription. We can send aGET
request to the AssemblyAI Speech-to-Text API to do that. Possible status keys are "queued", "processing" or "completed".audio_url
holdsaudioURL
, which specifies the location of the audio file we wish to transcribetext
will hold the transcribed text once the transcription is finished.
Remove the whole code segment we added in this section and just used for testing, since we don’t need it any longer.
Step 3 - Automating JavaScript Audio Transcript Fetching
To get our transcript, we need to send requests to check if the status of our transcription has changed from “processing” to “completed”.
Rather than having to do this manually by continuously executing GET requests until we receive a “completed” status, we will automate this checking process. There are different ways of automating this process, such as using Webhooks or using an interval. In this example, we cover the interval method.
We will add some code to audio.js
first and then break it down below. This is what the complete code looks like now:
const axios = require("axios")
const audioURL = "https://bit.ly/3yxKEIY"
const APIKey = "ASSEMBLYAI-API-KEY"
const refreshInterval = 5000
// Setting up the AssemblyAI headers
const assembly = axios.create({
baseURL: "https://api.assemblyai.com/v2",
headers: {
authorization: APIKey,
"content-type": "application/json",
},
})
const getTranscript = async () => {
// Sends the audio file to AssemblyAI for transcription
const response = await assembly.post("/transcript", {
audio_url: audioURL,
})
// Interval for checking transcript completion
const checkCompletionInterval = setInterval(async () => {
const transcript = await assembly.get(`/transcript/${response.data.id}`)
const transcriptStatus = transcript.data.status
if (transcriptStatus !== "completed") {
console.log(`Transcript Status: ${transcriptStatus}`)
} else if (transcriptStatus === "completed") {
console.log("\nTranscription completed!\n")
let transcriptText = transcript.data.text
console.log(`Your transcribed text:\n\n${transcriptText}`)
clearInterval(checkCompletionInterval)
}
}, refreshInterval)
}
getTranscript()
Code Breakdown
Let’s break the code down to better understand how this JavaScript Audio Transcript app works:
const refreshInterval = 5000
sets the time interval in milliseconds between performing transcription status checks.
const getTranscript
is an asynchronous JavaScript function that contains the code's logic. Since we are working with an API that returns data to us, it is a good practice to use an asynchronous function.const response
sends a POST request containing our audioURL to the AssemblyAI Speech-to-Text API and stores its response.const checkCompletionInterval
is another asynchronous function that utilizes the built-insetInterval
JavaScript method.const transcript = await assembly.get(`/transcript/${response.data.id}`)
sends a GET request to the AssemblyAI Speech-to-Text API using theid
of our transcript, which we define inside of the template string variable${response.data.id}
. The response is stored inside thetranscript
variable.const transcriptStatus = transcript.data.status
stores the completionstatus
of our transcription.- Next,
checkCompletetionInterval
contains an if/else statement that evaluates if the transcription has been completed or not. The value oftranscriptStatus
is either “queued”, “processing” or “completed”.if (transcriptStatus !== "completed")
checks if the transcript has been completed yet; and, if it hasn't been, logs thetranscriptStatus
to the console.else if (transcriptStatus === "completed")
evaluates if thetranscriptStatus
is equal to “completed”, which indicates that the transcription has been finished. If the transcription has been finished:console.log("\nTranscription completed!\n")
logs the completion status to the console.transcriptText = transcript.data.text
sets the value of transcriptText to the returned transcript.console.log(Your transcribed text:\n\n${transcriptText})
logs the transcribed text to the console with. The two\n\n
characters are newline characters that give the output some space when printed in the terminal.- Finally,
clearInterval(checkCompletionInterval)
clears the interval, stopping it from continuing.
- The
refreshInterval
variable is also added as a parameter to thecheckCompletionInterval
function. This is a requirement and tells thesetInterval
method how often to run the code inside of thesetInterval
function. To change this time, simply change the value of therefreshInterval
variable at the top of the code.
- All the way at the bottom of our code, we call the
getTranscript()
function to actually call the function.
If we run the app now, we will see an output similar to the first example image at the beginning of this article.
Conclusion
Using AssemblyAI’s Speech-to-Text API makes it easy to create a JavaScript Audio Transcript app. There are a lot of different ways to implement Speech Recognition in an application nowadays - check out our other articles on how to do Speech-to-Text with Golang, or how to implement React Speech Recognition.