Transcribe streaming audio
Overview
By the end of this tutorial, you’ll be able to transcribe audio from your microphone.
Before you begin
To complete this tutorial, you need:
Here’s the full sample code of what you’ll build in this tutorial:
Python
Python SDK
JavaScript
JavaScript SDK
Step 1: Install and import dependencies
Python
Python SDK
JavaScript
JavaScript SDK
Step 2: Configure the API key
In this step, you’ll configure your AssemblyAI API key to authenticate your application and enable access to the streaming transcription service.
Browse to API Keys in your dashboard, and then copy your API key.
Authenticate with a temporary token
If you need to authenticate on the client, you can avoid exposing your API key by using temporary authentication tokens.
Step 3: Set up audio and websocket configuration
Python
Python SDK
JavaScript
JavaScript SDK
Set the parameters that control how your client connects to AssemblyAI’s streaming transcription API. These options determine things like audio sample rate and whether you want punctuation and formatting in your final transcripts.
See Streaming endpoints and data zones for more information on endpoints for Streaming STT.
Step 4: Create event handlers
In this step, you’ll define event handlers to manage the different types of events emitted during the streaming session. The handlers will respond to session lifecycle events, transcription turns, errors, and session termination.
Python
Python SDK
JavaScript
JavaScript SDK
Implement basic event handlers. These handlers let your app respond to key streaming events:
on_open– Starts streaming microphone audio in a background thread.on_message– Handles transcription events likeBegin,Turn, andTermination.on_error– Logs any connection or streaming errors and triggers cleanup.on_close– Cleans up audio resources and saves a WAV recording when the session ends.
Message sequence and turn events
To get a better understanding of the turn event and the message sequences, check out our Message Sequence Breakdown page. This object is how you’ll receive your transcripts.
Step 5: Connect and start transcription
Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.
Python
Python SDK
JavaScript
JavaScript SDK
Step 6: Close the connection
Python
Python SDK
JavaScript
JavaScript SDK
Close the WebSocket connection when you’re done:
The connection will also close automatically when you press Ctrl+C. In both cases, the .close() handler will clean up the audio resources.
Note: Pricing is based on session duration so it is very important to close sessions properly to avoid unexpected usage and cost.
Next steps
To learn more about Streaming Speech-to-Text, see the following resources:
Need some help?
If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.