Transcribe streaming audio
Learn how to transcribe streaming audio.
Overview
By the end of this tutorial, you’ll be able to transcribe audio from your microphone.
Supported languages
Streaming Speech-to-Text is only available for English.
Before you begin
To complete this tutorial, you need:
Here’s the full sample code of what you’ll build in this tutorial:
Python SDK
Python
TypeScript SDK
TypeScript
Step 1: Install and import dependencies
Python SDK
Python
TypeScript SDK
TypeScript
Step 2: Configure the API key
In this step, you’ll configure your AssemblyAI API key to authenticate your application and enable access to the streaming transcription service.
Browse to API Keys in your dashboard, and then copy your API key.
Step 3: Set up audio and websocket configuration
Python SDK
Python
TypeScript SDK
TypeScript
The Python SDK handles audio configuration automatically. You’ll specify the sample rate when connecting to the transcriber. If you don’t set a sample rate, it defaults to 16 kHz.
Sample rate
Use a sample rate of 16 kHz and encoding of pcm_s16le. While all sampling rates are supported, using 16 kHz and pcm_s16le is recommended for the best experience, as our STT model operates at a 16 kHz sample rate. If the incoming audio uses a different rate, we perform additional sampling rate conversion under the hood, which might marginally increase latency.
Step 4: Create event handlers
In this step, you’ll define event handlers to manage the different types of events emitted during the streaming session. The handlers will respond to session lifecycle events, transcription turns, errors, and session termination.
Python SDK
Python
TypeScript SDK
TypeScript
Implement basic event handlers. These handlers let your app respond to key streaming events:
on_begin
– Logs when the session starts.on_turn
– Handles each transcription turn and optionally enables formatted turns.on_terminated
– Logs when the session ends and how much audio was processed.on_error
– Captures and prints any errors during streaming.
Message sequence and turn events
To get a better understanding of the turn event and the message sequences, check out our Message Sequence Breakdown page. This object is how you’ll receive your transcripts.
Step 5: Connect and start transcription
Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.
Python SDK
Python
TypeScript SDK
TypeScript
Step 6: Close the connection
Python SDK
Python
TypeScript SDK
TypeScript
Disconnect the client when you’re done:
The connection will also close automatically when you press Ctrl+C. In both cases, the .disconnect()
handler will clean up the audio resources.
Next steps
To learn more about Streaming Speech-to-Text, see the following resources:
Need some help?
If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.