Transcribe streaming audio | AssemblyAI

Overview

By the end of this tutorial, you’ll be able to transcribe audio from your microphone.

Supported languages

Streaming Speech-to-Text is only available for English.

Before you begin

To complete this tutorial, you need:

Python or Node.

Here’s the full sample code of what you’ll build in this tutorial:

Python SDK

Python

TypeScript SDK

TypeScript

1 import logging
2 from typing import Type
3 
4 import assemblyai as aai
5 from assemblyai.streaming.v3 import (
6     BeginEvent,
7     StreamingClient,
8     StreamingClientOptions,
9     StreamingError,
10     StreamingEvents,
11     StreamingParameters,
12     StreamingSessionParameters,
13     TerminationEvent,
14     TurnEvent,
15 )
16 
17 api_key = "<YOUR_API_KEY>"
18 
19 logging.basicConfig(level=logging.INFO)
20 logger = logging.getLogger(__name__)
21 
22 
23 def on_begin(self: Type[StreamingClient], event: BeginEvent):
24     print(f"Session started: {event.id}")
25 
26 
27 def on_turn(self: Type[StreamingClient], event: TurnEvent):
28     print(f"{event.transcript} ({event.end_of_turn})")
29 
30     if event.end_of_turn and not event.turn_is_formatted:
31         params = StreamingSessionParameters(
32             format_turns=True,
33         )
34 
35         self.set_params(params)
36 
37 
38 def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
39     print(
40         f"Session terminated: {event.audio_duration_seconds} seconds of audio processed"
41     )
42 
43 
44 def on_error(self: Type[StreamingClient], error: StreamingError):
45     print(f"Error occurred: {error}")
46 
47 
48 def main():
49     client = StreamingClient(
50         StreamingClientOptions(
51             api_key=api_key,
52             api_host="streaming.assemblyai.com",
53         )
54     )
55 
56     client.on(StreamingEvents.Begin, on_begin)
57     client.on(StreamingEvents.Turn, on_turn)
58     client.on(StreamingEvents.Termination, on_terminated)
59     client.on(StreamingEvents.Error, on_error)
60 
61     client.connect(
62         StreamingParameters(
63             sample_rate=16000,
64             format_turns=True,
65         )
66     )
67 
68     try:
69         client.stream(
70           aai.extras.MicrophoneStream(sample_rate=16000)
71         )
72     finally:
73         client.disconnect(terminate=True)
74 
75 
76 if __name__ == "__main__":
77     main()

Step 1: Install and import dependencies

Python SDK

Python

TypeScript SDK

TypeScript

Install the AssemblyAI Python SDK via PIP:

$ pip install assemblyai

Create a file called main.py and import the following packages at the top of your file:

1 import logging
2 from typing import Type
3 
4 import assemblyai as aai
5 from assemblyai.streaming.v3 import (
6     BeginEvent,
7     StreamingClient,
8     StreamingClientOptions,
9     StreamingError,
10     StreamingEvents,
11     StreamingParameters,
12     StreamingSessionParameters,
13     TerminationEvent,
14     TurnEvent,
15 )

Step 2: Configure the API key

In this step, you’ll configure your AssemblyAI API key to authenticate your application and enable access to the streaming transcription service.

Browse to API Keys in your dashboard, and then copy your API key.

Python SDK

Python

TypeScript SDK

TypeScript

Configure the SDK to use your API key. Replace <YOUR_API_KEY> with your copied API key.

1 api_key = "<YOUR_API_KEY>"

Step 3: Set up audio and websocket configuration

Python SDK

Python

TypeScript SDK

TypeScript

The Python SDK handles audio configuration automatically. You’ll specify the sample rate when connecting to the transcriber. If you don’t set a sample rate, it defaults to 16 kHz.

Sample rate

Use a sample rate of 16 kHz and encoding of pcm_s16le. While all sampling rates are supported, using 16 kHz and pcm_s16le is recommended for the best experience, as our STT model operates at a 16 kHz sample rate. If the incoming audio uses a different rate, we perform additional sampling rate conversion under the hood, which might marginally increase latency.

Step 4: Create event handlers

In this step, you’ll define event handlers to manage the different types of events emitted during the streaming session. The handlers will respond to session lifecycle events, transcription turns, errors, and session termination.

Python SDK

Python

TypeScript SDK

TypeScript

Implement basic event handlers. These handlers let your app respond to key streaming events:

on_begin – Logs when the session starts.
on_turn – Handles each transcription turn and optionally enables formatted turns.
on_terminated – Logs when the session ends and how much audio was processed.
on_error – Captures and prints any errors during streaming.

1 logging.basicConfig(level=logging.INFO)
2 logger = logging.getLogger(__name__)
3 
4 def on_begin(self: Type[StreamingClient], event: BeginEvent):
5     print(f"Session started: {event.id}")
6 
7 
8 def on_turn(self: Type[StreamingClient], event: TurnEvent):
9     print(f"{event.transcript} ({event.end_of_turn})")
10 
11     if event.end_of_turn and not event.turn_is_formatted:
12         params = StreamingSessionParameters(
13             format_turns=True,
14         )
15 
16         self.set_params(params)
17 
18 
19 def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
20     print(
21         f"Session terminated: {event.audio_duration_seconds} seconds of audio processed"
22     )
23 
24 
25 def on_error(self: Type[StreamingClient], error: StreamingError):
26     print(f"Error occurred: {error}")

Message sequence and turn events

To get a better understanding of the turn event and the message sequences, check out our Message Sequence Breakdown page. This object is how you’ll receive your transcripts.

Step 5: Connect and start transcription

Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.

Python SDK

Python

TypeScript SDK

TypeScript

In the main function create a client and connect to the streaming service:

1     client = StreamingClient(
2         StreamingClientOptions(
3             api_key=api_key,
4             api_host="streaming.assemblyai.com",
5         )
6     )
7 
8     client.on(StreamingEvents.Begin, on_begin)
9     client.on(StreamingEvents.Turn, on_turn)
10     client.on(StreamingEvents.Termination, on_terminated)
11     client.on(StreamingEvents.Error, on_error)
12 
13     client.connect(
14         StreamingParameters(
15             sample_rate=16000,
16             format_turns=True,
17         )
18     )

Next, create a microphone stream and begin transcribing audio. Make sure the sample_rate matches the value you specified in the StreamingParameters when initializing the streaming client.

1     try:
2         client.stream(
3           aai.extras.MicrophoneStream(sample_rate=16000)
4         )

Step 6: Close the connection

Python SDK

Python

TypeScript SDK

TypeScript

Disconnect the client when you’re done:

1     finally:
2         client.disconnect(terminate=True)

The connection will also close automatically when you press Ctrl+C. In both cases, the .disconnect() handler will clean up the audio resources.

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.