Quickstart - AssemblyAI

Overview

By the end of this guide, you’ll have a working script that transcribes your microphone live, printing each turn as you speak. Build it with an AI coding agent, or write it yourself — both are below. Prefer to try it first? Transcribe audio without writing any code in the AssemblyAI Playground.

Streaming is billed per sessionStreaming Speech-to-Text is billed on the total duration that your WebSocket connection stays open, not on the amount of audio you send. Always send a termination message when you’re done with a stream — sessions that aren’t closed auto-close after 3 hours and are billed for the full duration. See Billing and pricing for details.

Before you begin

You’ll need:

An API key — grab one from your dashboard. Every example below reads it from an environment variable, so set it once:
```
export ASSEMBLYAI_API_KEY=<your-key>
```
Python 3.8+ or Node.js 18+, depending on which SDK you use.
A working microphone — these examples capture live audio from it.

Building with an AI coding agent? Wire it up to AssemblyAI’s live docs (MCP server) and the AssemblyAI skill so it writes correct, up-to-date code instead of relying on stale training data:

claude mcp add --transport http --scope user assemblyai-docs https://assemblyai.com/docs/mcp
npx skills add AssemblyAI/assemblyai-skill --global

Then describe what you want to build. To get the same result as the steps below, paste:

Use the AssemblyAI Python SDK to transcribe my microphone in real time and print each turn.

Transcribe streaming audio

Prefer to write it yourself? Follow these steps to stream your microphone live. The AssemblyAI SDK manages the WebSocket connection, microphone capture, and session termination for you.

Step 1: Install the SDK

Python SDK
JavaScript SDK

pip install assemblyai sounddevice

npm install assemblyai @picovoice/pvrecorder-node

Step 2: Stream your first session

Save this as transcribe.py (Python) or transcribe.js (JavaScript). It streams your microphone and prints each turn until you press Ctrl+C:

Python SDK
JavaScript SDK

import os

import sounddevice as sd
from assemblyai.streaming.v3 import (
    BeginEvent,
    StreamingClient,
    StreamingClientOptions,
    StreamingError,
    StreamingEvents,
    StreamingParameters,
    TerminationEvent,
    TurnEvent,
)

SAMPLE_RATE = 16000


def on_begin(client: StreamingClient, event: BeginEvent):
    print(f"Session started: {event}")
    print("Connected. Speak into your microphone; press Ctrl+C to stop.")


def on_turn(client: StreamingClient, event: TurnEvent):
    print(event.transcript)


def on_terminated(client: StreamingClient, event: TerminationEvent):
    print(f"Session terminated: {event.audio_duration_seconds}s of audio processed")


def on_error(client: StreamingClient, error: StreamingError):
    print(f"Error: {error}")


def mic_stream():
    # sounddevice bundles PortAudio in its wheel — no system install needed.
    with sd.RawInputStream(
        samplerate=SAMPLE_RATE, channels=1, dtype="int16", blocksize=800
    ) as mic:
        while True:
            frames, _ = mic.read(800)  # ~50 ms of audio
            yield bytes(frames)


def main():
    client = StreamingClient(
        StreamingClientOptions(api_key=os.environ["ASSEMBLYAI_API_KEY"])
    )

    client.on(StreamingEvents.Begin, on_begin)
    client.on(StreamingEvents.Turn, on_turn)
    client.on(StreamingEvents.Termination, on_terminated)
    client.on(StreamingEvents.Error, on_error)

    client.connect(
        StreamingParameters(speech_model="u3-rt-pro", sample_rate=SAMPLE_RATE)
    )

    try:
        client.stream(mic_stream())
    except KeyboardInterrupt:
        pass
    finally:
        client.disconnect(terminate=True)


if __name__ == "__main__":
    main()

import { PvRecorder } from "@picovoice/pvrecorder-node";
import { AssemblyAI } from "assemblyai";

const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

const transcriber = client.streaming.transcriber({
  speechModel: "u3-rt-pro",
  sampleRate: 16_000,
});

transcriber.on("open", ({ id }) => console.log(`Session opened with ID: ${id}`));
transcriber.on("error", (error) => console.error("Error:", error));
transcriber.on("close", (code, reason) => console.log("Session closed:", code, reason));
transcriber.on("turn", (turn) => {
  if (turn.transcript) {
    console.log("Turn:", turn.transcript);
  }
});

// PvRecorder ships prebuilt native binaries — no SoX or system audio install needed.
const recorder = new PvRecorder(800, -1); // 800 samples ≈ 50 ms at 16 kHz

let running = true;
process.on("SIGINT", () => {
  running = false;
});

const run = async () => {
  await transcriber.connect();
  recorder.start();
  console.log("Recording — press Ctrl+C to stop.");

  while (running) {
    const frame = await recorder.read();
    transcriber.sendAudio(Buffer.from(frame.buffer, frame.byteOffset, frame.byteLength));
  }

  recorder.stop();
  recorder.release();
  await transcriber.close();
};

run();

Then run it — python transcribe.py or node transcribe.js — and start speaking. Each turn prints as you talk, and the session closes when you press Ctrl+C:

Session started: 7f3a9c2e-...
Smoke from hundreds of wildfires in Canada is triggering air quality alerts...
Session terminated: 12.0s of audio processed

That’s a full real-time transcriber. Prefer raw WebSockets? See Using the WebSocket API directly below.

What you get back

The transcriber emits JSON messages (the SDK surfaces them as open / turn / close events). The one you handle most is Turn, sent repeatedly as someone speaks — end_of_turn: true marks a finalized turn, and transcript is the text so far:

{
  "type": "Turn",
  "turn_order": 0,
  "end_of_turn": true,
  "turn_is_formatted": true,
  "end_of_turn_confidence": 1.0,
  "transcript": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
  "words": [
    { "text": "Smoke", "start": 0, "end": 399, "confidence": 0.99, "word_is_final": true }
  ]
}

You also receive a Begin message when the session opens ({ "type": "Begin", "id": "...", "expires_at": ... }) and a Termination message when it closes ({ "type": "Termination", "audio_duration_seconds": 10, "session_duration_seconds": 12 }). Word timings are in milliseconds. See the message sequence breakdown for the full event flow.

Using the WebSocket API directly

Not using an SDK? Connect to the streaming WebSocket at wss://streaming.assemblyai.com/v3/ws directly. Authenticate with your key in the Authorization header (no Bearer prefix), and manage the connection, microphone capture, the Begin / Turn / Termination messages, and session termination yourself — the SDK above does all of this for you. See the message sequence breakdown for the event flow and endpoints and data zones for regional endpoints. Both examples read your key from the same ASSEMBLYAI_API_KEY environment variable you set in Before you begin.

Streaming from a browser?Don’t ship your API key to client-side code. Authenticate from the browser with a short-lived temporary token instead.

Python
JavaScript

pip install sounddevice websocket-client

import json
import os
import threading
from urllib.parse import urlencode

import sounddevice as sd
import websocket

API_KEY = os.environ["ASSEMBLYAI_API_KEY"]
SAMPLE_RATE = 16000
CONNECTION_PARAMS = {"speech_model": "u3-rt-pro", "sample_rate": SAMPLE_RATE}
API_ENDPOINT = f"wss://streaming.assemblyai.com/v3/ws?{urlencode(CONNECTION_PARAMS)}"

stop = threading.Event()


def on_open(ws):
    print("Connected. Speak into your microphone; press Ctrl+C to stop.")

    def stream_audio():
        # sounddevice bundles PortAudio in its wheel — no system install needed.
        with sd.RawInputStream(
            samplerate=SAMPLE_RATE, channels=1, dtype="int16", blocksize=800
        ) as mic:
            while not stop.is_set():
                frames, _ = mic.read(800)  # ~50 ms of audio
                ws.send(bytes(frames), websocket.ABNF.OPCODE_BINARY)

    threading.Thread(target=stream_audio, daemon=True).start()


def on_message(ws, message):
    data = json.loads(message)
    if data.get("type") == "Turn":
        print(data.get("transcript", ""), end="\n" if data.get("end_of_turn") else "\r")


def on_error(ws, error):
    # On a normal shutdown, websocket-client hands the server's close frame to
    # on_error; ignore it and let on_close report the disconnect. Real failures
    # arrive as exceptions, not close frames.
    if isinstance(error, websocket.ABNF) and error.opcode == websocket.ABNF.OPCODE_CLOSE:
        return
    print(f"\nError: {error}")
    stop.set()


def on_close(ws, status, msg):
    stop.set()
    print("\nDisconnected.")


def main():
    ws = websocket.WebSocketApp(
        API_ENDPOINT,
        header={"Authorization": API_KEY},
        on_open=on_open,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close,
    )

    ws_thread = threading.Thread(target=ws.run_forever, daemon=True)
    ws_thread.start()

    try:
        while ws_thread.is_alive():
            ws_thread.join(0.1)
    except KeyboardInterrupt:
        stop.set()
        if ws.sock and ws.sock.connected:
            ws.send(json.dumps({"type": "Terminate"}))  # close the session
        ws.close()


if __name__ == "__main__":
    main()

npm install ws @picovoice/pvrecorder-node

const WebSocket = require("ws");
const querystring = require("querystring");
const { PvRecorder } = require("@picovoice/pvrecorder-node");

const API_KEY = process.env.ASSEMBLYAI_API_KEY;
const SAMPLE_RATE = 16000;
const params = { speech_model: "u3-rt-pro", sample_rate: SAMPLE_RATE };
const endpoint = `wss://streaming.assemblyai.com/v3/ws?${querystring.stringify(params)}`;

// PvRecorder ships prebuilt native binaries — no SoX or system audio install needed.
const recorder = new PvRecorder(800, -1); // 800 samples ≈ 50 ms at 16 kHz
const ws = new WebSocket(endpoint, { headers: { Authorization: API_KEY } });

let running = true;
process.on("SIGINT", () => {
  running = false;
});

ws.on("open", async () => {
  console.log("Connected. Speak into your microphone; press Ctrl+C to stop.");
  recorder.start();
  while (running && ws.readyState === WebSocket.OPEN) {
    const frame = await recorder.read();
    ws.send(Buffer.from(frame.buffer, frame.byteOffset, frame.byteLength));
  }
  recorder.stop();
  recorder.release();
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: "Terminate" })); // close the session
    ws.close();
  }
  process.exit();
});

ws.on("message", (message) => {
  const data = JSON.parse(message);
  if (data.type === "Turn") {
    process.stdout.write(data.end_of_turn ? `${data.transcript}\n` : `\r${data.transcript}`);
  }
});

ws.on("error", (error) => console.error("\nError:", error));
ws.on("close", () => console.log("\nDisconnected."));

Limits

Session length: a streaming session auto-closes after 3 hours.
Audio: mono 16-bit PCM; set sample_rate to match your source (16 kHz in these examples).
Rate limit: new-session rate limits scale automatically with usage (default 5 for free accounts). Check yours on the rate limits page.

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Streaming Speech-to-Text overview
Message sequence breakdown — understand the Begin, Turn, and Termination events
WebSocket API reference

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.

​Overview

​Before you begin

​Transcribe streaming audio

​Step 1: Install the SDK

​Step 2: Stream your first session

​What you get back

​Using the WebSocket API directly

​Limits

​Next steps

​Need some help?

Overview

Before you begin

Transcribe streaming audio

Step 1: Install the SDK

Step 2: Stream your first session

What you get back

Using the WebSocket API directly

Limits

Next steps

Need some help?