Terminate Streaming Session After Inactivity

An often-overlooked aspect of implementing AssemblyAI’s Streaming Speech-to-Text (STT) service is efficiently terminating transcription sessions. In this cookbook, you will learn how to terminate a Streaming session after any fixed duration of silence.

For the full code, refer to this GitHub gist.

Quickstart

1import logging
2from datetime import datetime
3from typing import Type
4
5import assemblyai as aai
6from assemblyai.streaming.v3 import (
7 BeginEvent,
8 StreamingClient,
9 StreamingClientOptions,
10 StreamingError,
11 StreamingEvents,
12 StreamingParameters,
13 StreamingSessionParameters,
14 TerminationEvent,
15 TurnEvent,
16)
17
18api_key = "<YOUR_API_KEY>"
19
20logging.basicConfig(level=logging.INFO)
21logger = logging.getLogger(__name__)
22
23last_transcript_received = datetime.now()
24terminated = False
25
26
27def on_begin(self: Type[StreamingClient], event: BeginEvent):
28 print(f"Session started: {event.id}")
29
30
31def on_turn(self: Type[StreamingClient], event: TurnEvent):
32 global last_transcript_received, terminated
33
34 if terminated:
35 return
36
37 print(f"{event.transcript} ({event.end_of_turn})")
38
39 if event.transcript.strip():
40 last_transcript_received = datetime.now()
41
42 silence_duration = (datetime.now() - last_transcript_received).total_seconds()
43 if silence_duration > 5:
44 print("No transcription received in 5 seconds. Terminating session...")
45 self.disconnect(terminate=True)
46 terminated = True
47 return
48
49 if event.end_of_turn and not event.turn_is_formatted:
50 self.set_params(StreamingSessionParameters(format_turns=True))
51
52
53def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
54 print(f"Session terminated after {event.audio_duration_seconds:.2f} seconds")
55
56
57def on_error(self: Type[StreamingClient], error: StreamingError):
58 print(f"Error occurred: {error}")
59
60
61def main():
62 client = StreamingClient(
63 StreamingClientOptions(
64 api_key=api_key,
65 api_host="streaming.assemblyai.com",
66 )
67 )
68
69 client.on(StreamingEvents.Begin, on_begin)
70 client.on(StreamingEvents.Turn, on_turn)
71 client.on(StreamingEvents.Termination, on_terminated)
72 client.on(StreamingEvents.Error, on_error)
73
74 client.connect(
75 StreamingParameters(
76 sample_rate=16000,
77 format_turns=True,
78 )
79 )
80
81 try:
82 client.stream(aai.extras.MicrophoneStream(sample_rate=16000))
83 finally:
84 if not terminated:
85 client.disconnect(terminate=True)
86
87
88if __name__ == "__main__":
89 main()

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

Step-by-step instructions

First, install AssemblyAI’s Python SDK.

$pip install assemblyai
1ickstart
2import logging
3from datetime import datetime
4from typing import Type
5import assemblyai as aai
6from assemblyai.streaming.v3 import (
7 BeginEvent,
8 StreamingClient,
9 StreamingClientOptions,
10 StreamingError,
11 StreamingEvents,
12 StreamingParameters,
13 StreamingSessionParameters,
14 TerminationEvent,
15 TurnEvent,
16)
17api_key = "<YOUR_API_KEY>"

Implementing Speech Activity Checks

Our Streaming API emits a Turn Event each time speech is processed. During periods of silence, no TurnEvent will be sent. You can use this behavior to detect inactivity and automatically terminate the session.

We can track the timestamp of the most recent non-empty transcript using a datetime. On every Turn Event, we:

  • Update the timestamp if meaningful speech is received

  • Check how many seconds have passed since the last valid transcript

  • If that exceeds your timeout (e.g. 5 seconds), terminate the session

Key Variables

1last_transcript_received = datetime.now()
2terminated = False

These are updated on every turn event.

Turn event logic

1def on_turn(self: Type[StreamingClient], event: TurnEvent):
2 global last_transcript_received, terminated
3
4 if terminated:
5 return
6
7 print(f"{event.transcript} ({event.end_of_turn})")
8
9 if event.transcript.strip():
10 last_transcript_received = datetime.now()
11
12 silence_duration = (datetime.now() - last_transcript_received).total_seconds()
13 if silence_duration > 5:
14 print("No transcription received in 5 seconds. Terminating session...")
15 self.disconnect(terminate=True)
16 terminated = True
17 return
18
19 if event.end_of_turn and not event.turn_is_formatted:
20 self.set_params(StreamingSessionParameters(format_turns=True))

This pattern ensures sessions are cleanly terminated after inactivity.

What You’ll Observe

  • Live transcription continues as long as there’s speech

  • After 5 seconds of silence, the session ends automatically

You can change the timeout value to suit your needs by modifying the silence_duration > 5 check.