Skip to main content

Overview

By the end of this guide, you’ll have a working script that transcribes your microphone live, printing each turn as you speak. Build it with an AI coding agent, or write it yourself — both are below. Prefer to try it first? Transcribe audio without writing any code in the AssemblyAI Playground.
Streaming is billed per sessionStreaming Speech-to-Text is billed on the total duration that your WebSocket connection stays open, not on the amount of audio you send. Always send a termination message when you’re done with a stream — sessions that aren’t closed auto-close after 3 hours and are billed for the full duration. See Billing and pricing for details.

Before you begin

You’ll need:
  • An API key — grab one from your dashboard. Every example below reads it from an environment variable, so set it once:
    export ASSEMBLYAI_API_KEY=<your-key>
    
  • Python 3.8+ or Node.js 18+, depending on which SDK you use.
  • A working microphone — these examples capture live audio from it.
Building with an AI coding agent? Wire it up to AssemblyAI’s live docs (MCP server) and the AssemblyAI skill so it writes correct, up-to-date code instead of relying on stale training data:
claude mcp add --transport http --scope user assemblyai-docs https://mcp.assemblyai.com/docs
npx skills add AssemblyAI/assemblyai-skill --global
Then describe what you want to build. To get the same result as the steps below, paste:
Use the AssemblyAI Python SDK to transcribe my microphone in real time and print each turn.

Transcribe streaming audio

Prefer to write it yourself? Follow these steps to stream your microphone live. The AssemblyAI SDK manages the WebSocket connection, microphone capture, and session termination for you.

Step 1: Install the SDK

pip install assemblyai sounddevice

Step 2: Run your first transcriber

Save this as transcribe.py (Python) or transcribe.js (JavaScript). It streams your microphone and prints each turn until you press Ctrl+C:
import os

import sounddevice as sd
from assemblyai.streaming.v3 import (
    BeginEvent,
    StreamingClient,
    StreamingClientOptions,
    StreamingError,
    StreamingEvents,
    StreamingParameters,
    TerminationEvent,
    TurnEvent,
)

SAMPLE_RATE = 16000


def on_begin(client: StreamingClient, event: BeginEvent):
    print(f"Session started: {event.id}")


def on_turn(client: StreamingClient, event: TurnEvent):
    print(event.transcript)


def on_terminated(client: StreamingClient, event: TerminationEvent):
    print(f"Session terminated: {event.audio_duration_seconds}s of audio processed")


def on_error(client: StreamingClient, error: StreamingError):
    print(f"Error: {error}")


def mic_stream():
    # sounddevice bundles PortAudio in its wheel — no system install needed.
    with sd.RawInputStream(
        samplerate=SAMPLE_RATE, channels=1, dtype="int16", blocksize=800
    ) as mic:
        while True:
            frames, _ = mic.read(800)  # ~50 ms of audio
            yield bytes(frames)


def main():
    client = StreamingClient(
        StreamingClientOptions(api_key=os.environ["ASSEMBLYAI_API_KEY"])
    )

    client.on(StreamingEvents.Begin, on_begin)
    client.on(StreamingEvents.Turn, on_turn)
    client.on(StreamingEvents.Termination, on_terminated)
    client.on(StreamingEvents.Error, on_error)

    client.connect(
        StreamingParameters(speech_model="u3-rt-pro", sample_rate=SAMPLE_RATE)
    )

    try:
        client.stream(mic_stream())
    except KeyboardInterrupt:
        pass
    finally:
        client.disconnect(terminate=True)


if __name__ == "__main__":
    main()
Then run it — python transcribe.py or node transcribe.js — and start speaking. Each turn prints as you talk, and the session closes when you press Ctrl+C:
Session started: 7f3a9c2e-...
Smoke from hundreds of wildfires in Canada is triggering air quality alerts...
Session terminated: 12.0s of audio processed
That’s a full real-time transcriber. Prefer raw WebSockets? See Using the WebSocket API directly below.

What you get back

The transcriber emits JSON messages (the SDK surfaces them as open / turn / close events). The one you handle most is Turn, sent repeatedly as someone speaks — end_of_turn: true marks a finalized turn, and transcript is the text so far:
{
  "type": "Turn",
  "turn_order": 0,
  "end_of_turn": true,
  "turn_is_formatted": true,
  "end_of_turn_confidence": 1.0,
  "transcript": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
  "words": [
    { "text": "Smoke", "start": 0, "end": 399, "confidence": 0.99, "word_is_final": true }
  ]
}
You also receive a Begin message when the session opens ({ "type": "Begin", "id": "...", "expires_at": ... }) and a Termination message when it closes ({ "type": "Termination", "audio_duration_seconds": 10, "session_duration_seconds": 12 }). Word timings are in milliseconds. See the message sequence breakdown for the full event flow.

Using the WebSocket API directly

Not using an SDK? Connect to the streaming WebSocket at wss://streaming.assemblyai.com/v3/ws directly. Authenticate with your key in the Authorization header (no Bearer prefix), and manage the connection, microphone capture, the Begin / Turn / Termination messages, and session termination yourself — the SDK above does all of this for you. See the message sequence breakdown for the event flow and endpoints and data zones for regional endpoints. Both examples read your key from the same ASSEMBLYAI_API_KEY environment variable you set in Before you begin.
Streaming from a browser?Don’t ship your API key to client-side code. Authenticate from the browser with a short-lived temporary token instead.
pip install sounddevice websocket-client
import json
import os
import threading
from urllib.parse import urlencode

import sounddevice as sd
import websocket

API_KEY = os.environ["ASSEMBLYAI_API_KEY"]
SAMPLE_RATE = 16000
CONNECTION_PARAMS = {"speech_model": "u3-rt-pro", "sample_rate": SAMPLE_RATE}
API_ENDPOINT = f"wss://streaming.assemblyai.com/v3/ws?{urlencode(CONNECTION_PARAMS)}"

stop = threading.Event()


def on_open(ws):
    print("Connected. Speak into your microphone; press Ctrl+C to stop.")

    def stream_audio():
        # sounddevice bundles PortAudio in its wheel — no system install needed.
        with sd.RawInputStream(
            samplerate=SAMPLE_RATE, channels=1, dtype="int16", blocksize=800
        ) as mic:
            while not stop.is_set():
                frames, _ = mic.read(800)  # ~50 ms of audio
                ws.send(bytes(frames), websocket.ABNF.OPCODE_BINARY)

    threading.Thread(target=stream_audio, daemon=True).start()


def on_message(ws, message):
    data = json.loads(message)
    if data.get("type") == "Turn":
        print(data.get("transcript", ""), end="\n" if data.get("end_of_turn") else "\r")


def on_error(ws, error):
    # On a normal shutdown, websocket-client hands the server's close frame to
    # on_error; ignore it and let on_close report the disconnect. Real failures
    # arrive as exceptions, not close frames.
    if isinstance(error, websocket.ABNF) and error.opcode == websocket.ABNF.OPCODE_CLOSE:
        return
    print(f"\nError: {error}")
    stop.set()


def on_close(ws, status, msg):
    stop.set()
    print("\nDisconnected.")


def main():
    ws = websocket.WebSocketApp(
        API_ENDPOINT,
        header={"Authorization": API_KEY},
        on_open=on_open,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close,
    )

    ws_thread = threading.Thread(target=ws.run_forever, daemon=True)
    ws_thread.start()

    try:
        while ws_thread.is_alive():
            ws_thread.join(0.1)
    except KeyboardInterrupt:
        stop.set()
        if ws.sock and ws.sock.connected:
            ws.send(json.dumps({"type": "Terminate"}))  # close the session
        ws.close()


if __name__ == "__main__":
    main()

Limits

  • Session length: a streaming session auto-closes after 3 hours.
  • Audio: mono 16-bit PCM; set sample_rate to match your source (16 kHz in these examples).
  • Rate limit: new-session rate limits scale automatically with usage (default 5 for free accounts). Check yours on the rate limits page.

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.