Transcribe streaming audio from a microphone in TypeScript | AssemblyAI

Overview

By the end of this tutorial, you’ll be able to transcribe audio from your microphone in TypeScript.

Supported languages

Streaming Speech-to-Text is only available for English.

Before you begin

To complete this tutorial, you need:

Node.js installed. You can check to see if it is installed with node -v.
TypeScript installed. You can check to see if it is installed with tsc -v
An AssemblyAI account with credit card set up.

Here’s the full sample code for what you’ll build in this tutorial:

TypeScript SDK

TypeScript

1 import { Readable } from 'stream'
2 import { AssemblyAI } from 'assemblyai'
3 import recorder from 'node-record-lpcm16'
4 
5 const run = async () => {
6 
7   const client = new AssemblyAI({
8     apiKey: '<YOUR_API_KEY>'
9   })
10   
11   const transcriber = client.realtime.transcriber({
12     sampleRate: 16_000
13   })
14 
15   transcriber.on('open', ({ sessionId }) => {
16     console.log(`Session opened with ID: ${sessionId}`)
17   })
18 
19   transcriber.on('error', (error) => {
20     console.error('Error:', error)
21   })
22 
23   transcriber.on('close', (code, reason) =>
24     console.log('Session closed:', code, reason)
25   )
26 
27   transcriber.on('transcript', (transcript) => {
28     if (!transcript.text) {
29       return
30     }
31 
32     if (transcript.message_type === 'PartialTranscript') {
33       console.log('Partial:', transcript.text)
34     } else {
35       console.log('Final:', transcript.text)
36     }
37   })
38 
39   try {
40     console.log('Connecting to real-time transcript service')
41     await transcriber.connect()
42 
43     console.log('Starting recording')
44     const recording = recorder.record({
45       channels: 1,
46       sampleRate: 16_000,
47       audioType: 'wav' // Linear PCM
48     })
49 
50     Readable.toWeb(recording.stream()).pipeTo(transcriber.stream())
51 
52     // Stop recording and close connection using Ctrl-C.
53     process.on('SIGINT', async function () {
54       console.log()
55       console.log('Stopping recording')
56       recording.stop()
57 
58       console.log('Closing real-time transcript connection')
59       await transcriber.close()
60 
61       process.exit()
62     })
63   } catch (error) {
64     console.error(error)
65   }
66 }
67 
68 run()

Step 1: Install dependencies and import packages

TypeScript SDK

TypeScript

Run npm init to create an NPM package, and then install the necessary packages via NPM:

$ npm install assemblyai node-record-lpcm16

Create a file called main.ts and import the packages at the top of your file:

1 import { Readable } from 'stream'
2 import { AssemblyAI } from 'assemblyai'
3 import recorder from 'node-record-lpcm16'

Step 2: Configure the API key

In this step, you’ll create an SDK client and configure it to use your API key.

In your file, define an async function.

1 const run = async () => {
2 
3 }

Browse to API Keys in your dashboard, and then copy your API key.

TypeScript SDK

TypeScript

In you async function, create an SDK client and configure the client to use your API key by replacing YOUR_API_KEY with your copied API key.

1 const run = async () => {
2   const client = new AssemblyAI({
3     apiKey: "<YOUR_API_KEY>",
4   });
5 }

Step 3: Create a streaming service

TypeScript SDK

TypeScript

Create a new streaming service from the AssemblyAI client and add it to your async function. If you don’t set a sample rate, it defaults to 16 kHz.

1 const transcriber = client.realtime.transcriber({
2   sampleRate: 16_000
3 })

Sample rate

The sample_rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network.

We recommend the following sample rates:

Minimum quality: 8_000 (8 kHz)
Medium quality: 16_000 (16 kHz)
Maximum quality: 48_000 (48 kHz)

Add the following event handlers to your async function.

TypeScript SDK

TypeScript

1 transcriber.on('open', ({ sessionId }) => {
2   console.log(`Session opened with ID: ${sessionId}`)
3 })
4 
5 transcriber.on('error', (error) => {
6   console.error('Error:', error)
7 })
8 
9 transcriber.on('close', (code, reason) =>
10   console.log('Session closed:', code, reason)
11 )

Create another event handler to handle incoming transcripts. The real-time transcriber returns two types of transcripts: partial and final.

Partial transcripts are returned as the audio is being streamed to AssemblyAI.
Final transcripts are returned when the service detects a pause in speech.

End of utterance controls

You can configure the silence threshold for automatic utterance detection and programmatically force the end of an utterance to immediately get a Final transcript.

TypeScript SDK

TypeScript

1   transcriber.on('transcript', (transcript) => {
2     if (!transcript.text) {
3       return
4     }
5 
6     if (transcript.message_type === 'PartialTranscript') {
7       console.log('Partial:', transcript.text)
8     } else {
9       console.log('Final:', transcript.text)
10     }
11   })

You can also use the on("transcript.partial"), and on("transcript.final") callbacks to handle partial and final transcripts separately.

Step 4: Connect the streaming service

Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.

TypeScript SDK

TypeScript

1 try {
2   console.log('Connecting to real-time transcript service')
3   await transcriber.connect()
4 } catch (error) {
5   console.error(error)
6 }

Step 5: Record audio from microphone

TypeScript SDK

TypeScript

Create a new microphone stream after the transcriber connects. The sampleRate needs to be the same value as the real-time service settings.

1 console.log('Starting recording')
2 const recording = recorder.record({
3   channels: 1,
4   sampleRate: 16_000,
5   audioType: 'wav' // Linear PCM
6 })

Audio data format

The recorder formats the audio data for you. If you want to stream data from elsewhere, make sure that your audio data is in the following format:

Single channel
16-bit signed integer PCM or mu-law encoding

By default, the Streaming STT service expects PCM16-encoded audio. If you want to use mu-law encoding, see Specifying the encoding.

Pipe the recording stream to the real-time stream to send the audio for transcription.

1 Readable.toWeb(recording.stream()).pipeTo(transcriber.stream())

Send audio buffers

If you don’t use streams, you can also send buffers of audio data using transcriber.sendAudio(buffer).

Step 6: Disconnect the real-time service

When you are done, disconnect the transcriber to close the connection.

TypeScript SDK

TypeScript

1 process.on("SIGINT", async function () {
2   console.log();
3   console.log("Stopping recording");
4   recording.stop();
5 
6   console.log("Closing real-time transcript connection");
7   await transcriber.close();
8 
9   process.exit();
10 });

To run the program, use tsc main.ts to compile the JavaScript file, and then run node main.js to run it.

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.