Tutorials

How to Build a JavaScript Audio Transcript Application

Learn how to build a JavaScript Audio Transcript application using Node.js and Axios with this step-by-step beginner's guide.

How to Build a JavaScript Audio Transcript Application

With the progressive development of fast and accurate Automatic Speech Recognition technologies, the demand for automatic transcriptions is higher than it has ever been before. Whether being used for accessibility reasons, like automatically generating subtitles for YouTube videos, or for convenience reasons, like automatically transcribing lectures/meetings, Automatic Speech Recognition finds use-cases at all levels across all industries.

In this tutorial, we will learn how to create a JavaScript Audio Transcript application powered by AssemblyAI’s Speech-to-Text API. The code in this tutorial is available on both Github and Replit.

Dependencies

What Are We Building?

We will be using Node.js and Axios to build a simple JavaScript Audio Transcript app. When provided with a URL that specifies the location of an audio file, the app will transcribe the audio to text and store the result inside of a variable as a string.

The final result will look like this:

JavaScript Audio Transcript
Final Result

Step 1 - Creating A New App

We will only need to write one JavaScript file to create our application. Before we start coding, we'll need to install two packages - Node.js and Axios. If Node.js is not installed, download and install it from the Node.js website, leaving everything on default during the installation. We'll install Axios momentarily.

To keep things organized, create a new folder called javascript-audio-transcript-aai and navigate into it:

mkdir javascript-audio-transcript-aai
cd javascript-audio-transcript-aai

Now create a JavaScript file called audio.js and install Axios:

touch audio.js
npm install axios

This process automatically creates a package.json file inside of the project folder, which now should look like this:

Project Folder

Now that the application folder is properly set up, we can move on to setting up credential authentication for AssemblyAI's API.

Step 2 - Setting Up Authentication

AssemblyAI’s free Speech-to-Text API is what will actually generate the transcripts of our audio files. To use the API, we will require an API key. An API key is sent along with an API request to ensure that the account making the request is authorized to use the API.

Create an AssemblyAI account and go to the AssemblyAI dashboard. On the dashboard, under Your API Key, copy the value of the API key. Note that the value of this key should be kept secret and is uniquely associated to your account.

JavaScript Audio Transcript
AssemblyAI Dashboard

Next, in the audio.js file, write the following code to handle authentication with AssemblyAI, replacing the value of “ASSEMBLY-API-KEY” with the value of the key just copied from the dashboard.

const axios = require("axios")
const audioURL = "https://bit.ly/3yxKEIY"
const APIKey = "ASSEMBLYAI-API-KEY"

const assembly = axios.create({
  baseURL: "https://api.assemblyai.com/v2",
  headers: {
    authorization: APIKey,
    "content-type": "application/json",
  },
})
audio.js

Code Breakdown

  • const axios = require("axios") imports Axios.
  • const audioURL = "https://bit.ly/3yxKEIY" creates the audioURL variable that links to the audio file we use for this example.
  • const APIKey = “ASSEMBLYAI-API-KEY” holds the value of your AssemblyAI API Key. Make sure to replace it with the actual API Key value from the AssemblyAI dashboard.
  • const assembly = axios.create({... handles authentication for every request that is sent to AssemblyAI. That is, it tells AssemblyAI that we are authorized to access its APIs. This variable holds the baseURL for the transcription API endpoint and the value of the APIKey. It also sets the content-type to application/json. We create this variable so that we don’t need to write a lot of repetitive code for each of our requests.

Testing Authentication

Now that we have API authentication set up, we can test if we are able to authenticate with AssemblyAI and actually get a response.

Add this line of code at the bottom of audio.js under the existing code:

...
assembly
    .post("/transcript", {
        audio_url: audioURL
    })
    .then((res) => console.log(res.data))
    .catch((err) => console.error(err));
audio.js

This code is using Axios to send a POST request using our audioURL to the AssemblyAI Speech-to-Text API and returns an object that we log in the console.

Open a terminal and navigate to the project folder, and then run the JavaScript file with the following code:

node audio.js

The output in the console is the response of the POST request and should look similar to this:

{
  id: 'o49rt8v5ea-bfae-4b39-b276-9eb69d172ade',
  language_model: 'assemblyai_default',
  acoustic_model: 'assemblyai_default',
  language_code: 'en_us',
  status: 'queued',
  audio_url: 'https://bit.ly/3yxKEIY',
  text: null,
  words: null,
  utterances: null,
...
}
Response

In this object, we can see a couple of elements that we will need:

  • id holds the transcript id. We need this id to check the status of our transcript.
  • status tells us the current status of our transcription. We can send a GET request to the AssemblyAI Speech-to-Text API to do that. Possible status keys are "queued", "processing" or "completed".
  • audio_url holds audioURL, which specifies the location of the audio file we wish to transcribe
  • text will hold the transcribed text once the transcription is finished.

Remove the whole code segment we added in this section and just used for testing,  since we don’t need it any longer.

Step 3 - Automating JavaScript Audio Transcript Fetching

To get our transcript, we need to send requests to check if the status of our transcription has changed from “processing” to “completed”.

Rather than having to do this manually by continuously executing GET requests until we receive a “completed” status, we will automate this checking process. There are different ways of automating this process, such as using Webhooks or using an interval. In this example, we cover the interval method.

We will add some code to audio.js first and then break it down below. This is what the complete code looks like now:

const axios = require("axios")
const audioURL = "https://bit.ly/3yxKEIY"
const APIKey = "ASSEMBLYAI-API-KEY"
const refreshInterval = 5000

// Setting up the AssemblyAI headers
const assembly = axios.create({
  baseURL: "https://api.assemblyai.com/v2",
  headers: {
    authorization: APIKey,
    "content-type": "application/json",
  },
})

const getTranscript = async () => {
  // Sends the audio file to AssemblyAI for transcription
  const response = await assembly.post("/transcript", {
    audio_url: audioURL,
  })

  // Interval for checking transcript completion
  const checkCompletionInterval = setInterval(async () => {
    const transcript = await assembly.get(`/transcript/${response.data.id}`)
    const transcriptStatus = transcript.data.status

    if (transcriptStatus !== "completed") {
      console.log(`Transcript Status: ${transcriptStatus}`)
    } else if (transcriptStatus === "completed") {
      console.log("\nTranscription completed!\n")
      let transcriptText = transcript.data.text
      console.log(`Your transcribed text:\n\n${transcriptText}`)
      clearInterval(checkCompletionInterval)
    }
  }, refreshInterval)
}

getTranscript()
audio.js

Code Breakdown

Let’s break the code down to better understand how this JavaScript Audio Transcript app works:

  • const refreshInterval = 5000 sets the time interval in milliseconds between performing transcription status checks.
  • const getTranscript is an asynchronous JavaScript function that contains the code's logic. Since we are working with an API that returns data to us, it is a good practice to use an asynchronous function.
    • const response sends a POST request containing our audioURL to the AssemblyAI Speech-to-Text API and stores its response.
    • const checkCompletionInterval is another asynchronous function that utilizes the built-in setInterval JavaScript method.
      • const transcript = await assembly.get(`/transcript/${response.data.id}`) sends a GET request to the AssemblyAI Speech-to-Text API using the id of our transcript, which we define inside of the template string variable ${response.data.id}. The response is stored inside the transcript variable.
      • const transcriptStatus = transcript.data.status stores the completion status of our transcription.
      • Next, checkCompletetionInterval contains an if/else statement that evaluates if the transcription has been completed or not. The value of transcriptStatus is either “queued”, “processing” or “completed”.
        • if (transcriptStatus !== "completed") checks if the transcript has been completed yet; and, if it hasn't been, logs the transcriptStatus to the console.
        • else if (transcriptStatus === "completed") evaluates if the transcriptStatus is equal to “completed”, which indicates that the transcription has been finished. If the transcription has been finished:
          • console.log("\nTranscription completed!\n") logs the completion status to the console.
          • transcriptText = transcript.data.text sets the value of transcriptText to the returned transcript.
          • console.log(Your transcribed text:\n\n${transcriptText}) logs the transcribed text to the console with. The two \n\n characters are newline characters that give the output some space when printed in the terminal.
          • Finally, clearInterval(checkCompletionInterval) clears the interval, stopping it from continuing.
    • The refreshInterval variable is also added as a parameter to the checkCompletionInterval function. This is a requirement and tells the setInterval method how often to run the code inside of the setInterval function. To change this time, simply change the value of the refreshInterval variable at the top of the code.
  • All the way at the bottom of our code, we call the getTranscript() function to actually call the function.

If we run the app now, we will see an output similar to the first example image at the beginning of this article.

Conclusion

Using AssemblyAI’s Speech-to-Text API makes it easy to create a JavaScript Audio Transcript app. There are a lot of different ways to implement Speech Recognition in an application nowadays - check out our other articles on how to do Speech-to-Text with Golang, or how to implement React Speech Recognition.