Overview
By the end of this guide, you’ll have a working script that transcribes your microphone live, printing each turn as you speak. Build it with an AI coding agent, or write it yourself — both are below. Prefer to try it first? Transcribe audio without writing any code in the AssemblyAI Playground.Streaming is billed per sessionStreaming Speech-to-Text is billed on the total duration that your WebSocket connection stays open, not on the amount of audio you send. Always send a termination message when you’re done with a stream — sessions that aren’t closed auto-close after 3 hours and are billed for the full duration. See Billing and pricing for details.
Before you begin
You’ll need:-
An API key — grab one from your dashboard. Every example below reads it from an environment variable, so set it once:
- Python 3.8+ or Node.js 18+, depending on which SDK you use.
- A working microphone — these examples capture live audio from it.
Transcribe streaming audio
Prefer to write it yourself? Follow these steps to stream your microphone live. The AssemblyAI SDK manages the WebSocket connection, microphone capture, and session termination for you.Step 1: Install the SDK
- Python SDK
- JavaScript SDK
Step 2: Run your first transcriber
Save this astranscribe.py (Python) or transcribe.js (JavaScript). It streams your microphone and prints each turn until you press Ctrl+C:
- Python SDK
- JavaScript SDK
python transcribe.py or node transcribe.js — and start speaking. Each turn prints as you talk, and the session closes when you press Ctrl+C:
What you get back
The transcriber emits JSON messages (the SDK surfaces them asopen / turn / close events). The one you handle most is Turn, sent repeatedly as someone speaks — end_of_turn: true marks a finalized turn, and transcript is the text so far:
Begin message when the session opens ({ "type": "Begin", "id": "...", "expires_at": ... }) and a Termination message when it closes ({ "type": "Termination", "audio_duration_seconds": 10, "session_duration_seconds": 12 }). Word timings are in milliseconds. See the message sequence breakdown for the full event flow.
Using the WebSocket API directly
Not using an SDK? Connect to the streaming WebSocket atwss://streaming.assemblyai.com/v3/ws directly. Authenticate with your key in the Authorization header (no Bearer prefix), and manage the connection, microphone capture, the Begin / Turn / Termination messages, and session termination yourself — the SDK above does all of this for you. See the message sequence breakdown for the event flow and endpoints and data zones for regional endpoints.
Both examples read your key from the same ASSEMBLYAI_API_KEY environment variable you set in Before you begin.
Streaming from a browser?Don’t ship your API key to client-side code. Authenticate from the browser with a
short-lived temporary token instead.
- Python
- JavaScript
Limits
- Session length: a streaming session auto-closes after 3 hours.
- Audio: mono 16-bit PCM; set
sample_rateto match your source (16 kHz in these examples). - Rate limit: new-session rate limits scale automatically with usage (default 5 for free accounts). Check yours on the rate limits page.
Next steps
To learn more about Streaming Speech-to-Text, see the following resources:- Streaming Speech-to-Text overview
- Message sequence breakdown — understand the
Begin,Turn, andTerminationevents - WebSocket API reference