Getting started

Transcribe streaming audio

Learn how to transcribe streaming audio.

Overview

By the end of this tutorial, you’ll be able to transcribe audio from your microphone.

Supported languages

Streaming Speech-to-Text is only available for English.

Before you begin

To complete this tutorial, you need:

Here’s the full sample code of what you’ll build in this tutorial:

1import logging
2from typing import Type
3
4import assemblyai as aai
5from assemblyai.streaming.v3 import (
6 BeginEvent,
7 StreamingClient,
8 StreamingClientOptions,
9 StreamingError,
10 StreamingEvents,
11 StreamingParameters,
12 StreamingSessionParameters,
13 TerminationEvent,
14 TurnEvent,
15)
16
17api_key = "<YOUR_API_KEY>"
18
19logging.basicConfig(level=logging.INFO)
20logger = logging.getLogger(__name__)
21
22
23def on_begin(self: Type[StreamingClient], event: BeginEvent):
24 print(f"Session started: {event.id}")
25
26
27def on_turn(self: Type[StreamingClient], event: TurnEvent):
28 print(f"{event.transcript} ({event.end_of_turn})")
29
30 if event.end_of_turn and not event.turn_is_formatted:
31 params = StreamingSessionParameters(
32 format_turns=True,
33 )
34
35 self.set_params(params)
36
37
38def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
39 print(
40 f"Session terminated: {event.audio_duration_seconds} seconds of audio processed"
41 )
42
43
44def on_error(self: Type[StreamingClient], error: StreamingError):
45 print(f"Error occurred: {error}")
46
47
48def main():
49 client = StreamingClient(
50 StreamingClientOptions(
51 api_key=api_key,
52 api_host="streaming.assemblyai.com",
53 )
54 )
55
56 client.on(StreamingEvents.Begin, on_begin)
57 client.on(StreamingEvents.Turn, on_turn)
58 client.on(StreamingEvents.Termination, on_terminated)
59 client.on(StreamingEvents.Error, on_error)
60
61 client.connect(
62 StreamingParameters(
63 sample_rate=16000,
64 format_turns=True,
65 )
66 )
67
68 try:
69 client.stream(
70 aai.extras.MicrophoneStream(sample_rate=16000)
71 )
72 finally:
73 client.disconnect(terminate=True)
74
75
76if __name__ == "__main__":
77 main()

Step 1: Install and import dependencies

1

Install the AssemblyAI Python SDK via PIP:

$pip install assemblyai
2

Create a file called main.py and import the following packages at the top of your file:

1import logging
2from typing import Type
3
4import assemblyai as aai
5from assemblyai.streaming.v3 import (
6 BeginEvent,
7 StreamingClient,
8 StreamingClientOptions,
9 StreamingError,
10 StreamingEvents,
11 StreamingParameters,
12 StreamingSessionParameters,
13 TerminationEvent,
14 TurnEvent,
15)

Step 2: Configure the API key

In this step, you’ll configure your AssemblyAI API key to authenticate your application and enable access to the streaming transcription service.

1

Browse to API Keys in your dashboard, and then copy your API key.

2

Configure the SDK to use your API key. Replace <YOUR_API_KEY> with your copied API key.

1api_key = "<YOUR_API_KEY>"

Step 3: Set up audio and websocket configuration

The Python SDK handles audio configuration automatically. You’ll specify the sample rate when connecting to the transcriber. If you don’t set a sample rate, it defaults to 16 kHz.

Sample rate

Use a sample rate of 16 kHz and encoding of pcm_s16le. While all sampling rates are supported, using 16 kHz and pcm_s16le is recommended for the best experience, as our STT model operates at a 16 kHz sample rate. If the incoming audio uses a different rate, we perform additional sampling rate conversion under the hood, which might marginally increase latency.

Step 4: Create event handlers

In this step, you’ll define event handlers to manage the different types of events emitted during the streaming session. The handlers will respond to session lifecycle events, transcription turns, errors, and session termination.

Implement basic event handlers. These handlers let your app respond to key streaming events:

  • on_begin – Logs when the session starts.
  • on_turn – Handles each transcription turn and optionally enables formatted turns.
  • on_terminated – Logs when the session ends and how much audio was processed.
  • on_error – Captures and prints any errors during streaming.
1logging.basicConfig(level=logging.INFO)
2logger = logging.getLogger(__name__)
3
4def on_begin(self: Type[StreamingClient], event: BeginEvent):
5 print(f"Session started: {event.id}")
6
7
8def on_turn(self: Type[StreamingClient], event: TurnEvent):
9 print(f"{event.transcript} ({event.end_of_turn})")
10
11 if event.end_of_turn and not event.turn_is_formatted:
12 params = StreamingSessionParameters(
13 format_turns=True,
14 )
15
16 self.set_params(params)
17
18
19def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
20 print(
21 f"Session terminated: {event.audio_duration_seconds} seconds of audio processed"
22 )
23
24
25def on_error(self: Type[StreamingClient], error: StreamingError):
26 print(f"Error occurred: {error}")
Message sequence and turn events

To get a better understanding of the turn event and the message sequences, check out our Message Sequence Breakdown page. This object is how you’ll receive your transcripts.

Step 5: Connect and start transcription

Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.

1

In the main function create a client and connect to the streaming service:

1 client = StreamingClient(
2 StreamingClientOptions(
3 api_key=api_key,
4 api_host="streaming.assemblyai.com",
5 )
6 )
7
8 client.on(StreamingEvents.Begin, on_begin)
9 client.on(StreamingEvents.Turn, on_turn)
10 client.on(StreamingEvents.Termination, on_terminated)
11 client.on(StreamingEvents.Error, on_error)
12
13 client.connect(
14 StreamingParameters(
15 sample_rate=16000,
16 format_turns=True,
17 )
18 )
2

Next, create a microphone stream and begin transcribing audio. Make sure the sample_rate matches the value you specified in the StreamingParameters when initializing the streaming client.

1 try:
2 client.stream(
3 aai.extras.MicrophoneStream(sample_rate=16000)
4 )

Step 6: Close the connection

Disconnect the client when you’re done:

1 finally:
2 client.disconnect(terminate=True)

The connection will also close automatically when you press Ctrl+C. In both cases, the .disconnect() handler will clean up the audio resources.

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.