Terminate Streaming Session After Inactivity | AssemblyAI

An often-overlooked aspect of implementing AssemblyAI’s Streaming Speech-to-Text (STT) service is efficiently terminating transcription sessions. In this cookbook, you will learn how to terminate a Streaming session after any fixed duration of silence.

For the full code, refer to this GitHub gist.

Quickstart

1 import logging
2 from datetime import datetime
3 from typing import Type
4 
5 import assemblyai as aai
6 from assemblyai.streaming.v3 import (
7     BeginEvent,
8     StreamingClient,
9     StreamingClientOptions,
10     StreamingError,
11     StreamingEvents,
12     StreamingParameters,
13     StreamingSessionParameters,
14     TerminationEvent,
15     TurnEvent,
16 )
17 
18 api_key = "<YOUR_API_KEY>"
19 
20 logging.basicConfig(level=logging.INFO)
21 logger = logging.getLogger(__name__)
22 
23 last_transcript_received = datetime.now()
24 terminated = False
25 
26 
27 def on_begin(self: Type[StreamingClient], event: BeginEvent):
28     print(f"Session started: {event.id}")
29 
30 
31 def on_turn(self: Type[StreamingClient], event: TurnEvent):
32     global last_transcript_received, terminated
33 
34     if terminated:
35         return
36 
37     print(f"{event.transcript} ({event.end_of_turn})")
38 
39     if event.transcript.strip():
40         last_transcript_received = datetime.now()
41 
42     silence_duration = (datetime.now() - last_transcript_received).total_seconds()
43     if silence_duration > 5:
44         print("No transcription received in 5 seconds. Terminating session...")
45         self.disconnect(terminate=True)
46         terminated = True
47         return
48 
49     if event.end_of_turn and not event.turn_is_formatted:
50         self.set_params(StreamingSessionParameters(format_turns=True))
51 
52 
53 def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
54     print(f"Session terminated after {event.audio_duration_seconds:.2f} seconds")
55 
56 
57 def on_error(self: Type[StreamingClient], error: StreamingError):
58     print(f"Error occurred: {error}")
59 
60 
61 def main():
62     client = StreamingClient(
63         StreamingClientOptions(
64             api_key=api_key,
65             api_host="streaming.assemblyai.com",
66         )
67     )
68 
69     client.on(StreamingEvents.Begin, on_begin)
70     client.on(StreamingEvents.Turn, on_turn)
71     client.on(StreamingEvents.Termination, on_terminated)
72     client.on(StreamingEvents.Error, on_error)
73 
74     client.connect(
75         StreamingParameters(
76             sample_rate=16000,
77             format_turns=True,
78         )
79     )
80 
81     try:
82         client.stream(aai.extras.MicrophoneStream(sample_rate=16000))
83     finally:
84         if not terminated:
85             client.disconnect(terminate=True)
86 
87 
88 if __name__ == "__main__":
89     main()

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

Step-by-step instructions

First, install AssemblyAI’s Python SDK.

$ pip install assemblyai

1 ickstart
2 import logging
3 from datetime import datetime
4 from typing import Type
5 import assemblyai as aai
6 from assemblyai.streaming.v3 import (
7     BeginEvent,
8     StreamingClient,
9     StreamingClientOptions,
10     StreamingError,
11     StreamingEvents,
12     StreamingParameters,
13     StreamingSessionParameters,
14     TerminationEvent,
15     TurnEvent,
16 )
17 api_key = "<YOUR_API_KEY>"

Implementing Speech Activity Checks

Our Streaming API emits a Turn Event each time speech is processed. During periods of silence, no TurnEvent will be sent. You can use this behavior to detect inactivity and automatically terminate the session.

We can track the timestamp of the most recent non-empty transcript using a datetime. On every Turn Event, we:

Update the timestamp if meaningful speech is received
Check how many seconds have passed since the last valid transcript
If that exceeds your timeout (e.g. 5 seconds), terminate the session

Key Variables

1 last_transcript_received = datetime.now()
2 terminated = False

These are updated on every turn event.

Turn event logic

1 def on_turn(self: Type[StreamingClient], event: TurnEvent):
2     global last_transcript_received, terminated
3 
4     if terminated:
5         return
6 
7     print(f"{event.transcript} ({event.end_of_turn})")
8 
9     if event.transcript.strip():
10         last_transcript_received = datetime.now()
11 
12     silence_duration = (datetime.now() - last_transcript_received).total_seconds()
13     if silence_duration > 5:
14         print("No transcription received in 5 seconds. Terminating session...")
15         self.disconnect(terminate=True)
16         terminated = True
17         return
18 
19     if event.end_of_turn and not event.turn_is_formatted:
20         self.set_params(StreamingSessionParameters(format_turns=True))

This pattern ensures sessions are cleanly terminated after inactivity.

What You’ll Observe

Live transcription continues as long as there’s speech
After 5 seconds of silence, the session ends automatically

You can change the timeout value to suit your needs by modifying the silence_duration > 5 check.