> ## Documentation Index
> Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

> Learn how to transcribe streaming audio.

## Overview

By the end of this guide, you'll have a working script that transcribes your microphone live, printing each turn as you speak. Build it with an AI coding agent, or write it yourself — both are below.

Prefer to try it first? Transcribe audio without writing any code in the [AssemblyAI Playground](https://www.assemblyai.com/playground).

<Note>
  **Streaming is billed per session**

  Streaming Speech-to-Text is billed on the total duration that your WebSocket connection stays open, not on the amount of audio you send. Always send a termination message when you're done with a stream — sessions that aren't closed auto-close after 3 hours and are billed for the full duration. See [Billing and pricing](/billing-and-pricing) for details.
</Note>

## Before you begin

You'll need:

* **An API key** — grab one from [your dashboard](https://www.assemblyai.com/dashboard/api-keys). Every example below reads it from an environment variable, so set it once:

  ```bash theme={null}
  export ASSEMBLYAI_API_KEY=<your-key>
  ```

* **Python 3.8+ or Node.js 18+**, depending on which SDK you use.

* **A working microphone** — these examples capture live audio from it.

**Building with an AI coding agent?** Wire it up to AssemblyAI's live docs (MCP server) and the AssemblyAI skill so it writes correct, up-to-date code instead of relying on stale training data:

```bash theme={null}
claude mcp add --transport http --scope user assemblyai-docs https://assemblyai.com/docs/mcp
npx skills add AssemblyAI/assemblyai-skill --global
```

Then describe what you want to build. To get the same result as the steps below, paste:

```text theme={null}
Use the AssemblyAI Python SDK to transcribe my microphone in real time and print each turn.
```

## Transcribe streaming audio

Prefer to write it yourself? Follow these steps to stream your microphone live. The AssemblyAI SDK manages the WebSocket connection, microphone capture, and session termination for you.

### Step 1: Install the SDK

<Tabs groupId="language">
  <Tab language="python-sdk" title="Python SDK" default>
    ```bash theme={null}
    pip install assemblyai sounddevice
    ```
  </Tab>

  <Tab language="javascript-sdk" title="JavaScript SDK">
    ```bash theme={null}
    npm install assemblyai @picovoice/pvrecorder-node
    ```
  </Tab>
</Tabs>

### Step 2: Stream your first session

Save this as `transcribe.py` (Python) or `transcribe.js` (JavaScript). It streams your microphone and prints each turn until you press Ctrl+C:

<Tabs groupId="language">
  <Tab language="python-sdk" title="Python SDK" default>
    ```python expandable theme={null}
    import os

    import sounddevice as sd
    from assemblyai.streaming.v3 import (
        BeginEvent,
        StreamingClient,
        StreamingClientOptions,
        StreamingError,
        StreamingEvents,
        StreamingParameters,
        TerminationEvent,
        TurnEvent,
    )

    SAMPLE_RATE = 16000


    def on_begin(client: StreamingClient, event: BeginEvent):
        print(f"Session started: {event}")
        print("Connected. Speak into your microphone; press Ctrl+C to stop.")


    def on_turn(client: StreamingClient, event: TurnEvent):
        print(event.transcript)


    def on_terminated(client: StreamingClient, event: TerminationEvent):
        print(f"Session terminated: {event.audio_duration_seconds}s of audio processed")


    def on_error(client: StreamingClient, error: StreamingError):
        print(f"Error: {error}")


    def mic_stream():
        # sounddevice bundles PortAudio in its wheel — no system install needed.
        with sd.RawInputStream(
            samplerate=SAMPLE_RATE, channels=1, dtype="int16", blocksize=800
        ) as mic:
            while True:
                frames, _ = mic.read(800)  # ~50 ms of audio
                yield bytes(frames)


    def main():
        client = StreamingClient(
            StreamingClientOptions(api_key=os.environ["ASSEMBLYAI_API_KEY"])
        )

        client.on(StreamingEvents.Begin, on_begin)
        client.on(StreamingEvents.Turn, on_turn)
        client.on(StreamingEvents.Termination, on_terminated)
        client.on(StreamingEvents.Error, on_error)

        client.connect(
            StreamingParameters(speech_model="universal-3-5-pro", sample_rate=SAMPLE_RATE)
        )

        try:
            client.stream(mic_stream())
        except KeyboardInterrupt:
            pass
        finally:
            client.disconnect(terminate=True)


    if __name__ == "__main__":
        main()
    ```
  </Tab>

  <Tab language="javascript-sdk" title="JavaScript SDK">
    ```javascript expandable theme={null}
    import { PvRecorder } from "@picovoice/pvrecorder-node";
    import { AssemblyAI } from "assemblyai";

    const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

    const transcriber = client.streaming.transcriber({
      speechModel: "universal-3-5-pro",
      sampleRate: 16_000,
    });

    transcriber.on("open", ({ id }) => console.log(`Session opened with ID: ${id}`));
    transcriber.on("error", (error) => console.error("Error:", error));
    transcriber.on("close", (code, reason) => console.log("Session closed:", code, reason));
    transcriber.on("turn", (turn) => {
      if (turn.transcript) {
        console.log("Turn:", turn.transcript);
      }
    });

    // PvRecorder ships prebuilt native binaries — no SoX or system audio install needed.
    const recorder = new PvRecorder(800, -1); // 800 samples ≈ 50 ms at 16 kHz

    let running = true;
    process.on("SIGINT", () => {
      running = false;
    });

    const run = async () => {
      await transcriber.connect();
      recorder.start();
      console.log("Recording — press Ctrl+C to stop.");

      while (running) {
        const frame = await recorder.read();
        transcriber.sendAudio(Buffer.from(frame.buffer, frame.byteOffset, frame.byteLength));
      }

      recorder.stop();
      recorder.release();
      await transcriber.close();
    };

    run();
    ```
  </Tab>
</Tabs>

Then run it — `python transcribe.py` or `node transcribe.js` — and start speaking. Each turn prints as you talk, and the session closes when you press Ctrl+C:

```text theme={null}
Session started: 7f3a9c2e-...
Smoke from hundreds of wildfires in Canada is triggering air quality alerts...
Session terminated: 12.0s of audio processed
```

That's a full real-time transcriber. Prefer raw WebSockets? See [Using the WebSocket API directly](#using-the-websocket-api-directly) below.

## What you get back

The transcriber emits JSON messages (the SDK surfaces them as `open` / `turn` / `close` events). The one you handle most is `Turn`, sent repeatedly as someone speaks — `end_of_turn: true` marks a finalized turn, and `transcript` is the text so far:

```json theme={null}
{
  "type": "Turn",
  "turn_order": 0,
  "end_of_turn": true,
  "turn_is_formatted": true,
  "end_of_turn_confidence": 1.0,
  "transcript": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
  "words": [
    { "text": "Smoke", "start": 0, "end": 399, "confidence": 0.99, "word_is_final": true }
  ]
}
```

You also receive a `Begin` message when the session opens (`{ "type": "Begin", "id": "...", "expires_at": ... }`) and a `Termination` message when it closes (`{ "type": "Termination", "audio_duration_seconds": 10, "session_duration_seconds": 12 }`). Word timings are in milliseconds. See the [message sequence breakdown](/streaming/getting-started/transcribe-streaming-audio) for the full event flow.

## Using the WebSocket API directly

Not using an SDK? Connect to the streaming WebSocket at `wss://streaming.assemblyai.com/v3/ws` directly. Authenticate with your key in the `Authorization` header (no `Bearer` prefix), and manage the connection, microphone capture, the `Begin` / `Turn` / `Termination` messages, and session termination yourself — the SDK above does all of this for you. See the [message sequence breakdown](/streaming/getting-started/transcribe-streaming-audio) for the event flow and [endpoints and data zones](/streaming/endpoints-and-data-zones) for regional endpoints.

Both examples read your key from the same `ASSEMBLYAI_API_KEY` environment variable you set in [Before you begin](#before-you-begin).

<Note>
  **Streaming from a browser?**

  Don't ship your API key to client-side code. Authenticate from the browser with a
  short-lived [temporary token](/streaming/authenticate-with-a-temporary-token) instead.
</Note>

<Tabs groupId="language">
  <Tab language="python" title="Python" default>
    ```bash theme={null}
    pip install sounddevice websocket-client
    ```

    ```python expandable theme={null}
    import json
    import os
    import threading
    from urllib.parse import urlencode

    import sounddevice as sd
    import websocket

    API_KEY = os.environ["ASSEMBLYAI_API_KEY"]
    SAMPLE_RATE = 16000
    CONNECTION_PARAMS = {"speech_model": "universal-3-5-pro", "sample_rate": SAMPLE_RATE}
    API_ENDPOINT = f"wss://streaming.assemblyai.com/v3/ws?{urlencode(CONNECTION_PARAMS)}"

    stop = threading.Event()


    def on_open(ws):
        print("Connected. Speak into your microphone; press Ctrl+C to stop.")

        def stream_audio():
            # sounddevice bundles PortAudio in its wheel — no system install needed.
            with sd.RawInputStream(
                samplerate=SAMPLE_RATE, channels=1, dtype="int16", blocksize=800
            ) as mic:
                while not stop.is_set():
                    frames, _ = mic.read(800)  # ~50 ms of audio
                    ws.send(bytes(frames), websocket.ABNF.OPCODE_BINARY)

        threading.Thread(target=stream_audio, daemon=True).start()


    def on_message(ws, message):
        data = json.loads(message)
        if data.get("type") == "Turn":
            print(data.get("transcript", ""), end="\n" if data.get("end_of_turn") else "\r")


    def on_error(ws, error):
        # On a normal shutdown, websocket-client hands the server's close frame to
        # on_error; ignore it and let on_close report the disconnect. Real failures
        # arrive as exceptions, not close frames.
        if isinstance(error, websocket.ABNF) and error.opcode == websocket.ABNF.OPCODE_CLOSE:
            return
        print(f"\nError: {error}")
        stop.set()


    def on_close(ws, status, msg):
        stop.set()
        print("\nDisconnected.")


    def main():
        ws = websocket.WebSocketApp(
            API_ENDPOINT,
            header={"Authorization": API_KEY},
            on_open=on_open,
            on_message=on_message,
            on_error=on_error,
            on_close=on_close,
        )

        ws_thread = threading.Thread(target=ws.run_forever, daemon=True)
        ws_thread.start()

        try:
            while ws_thread.is_alive():
                ws_thread.join(0.1)
        except KeyboardInterrupt:
            stop.set()
            if ws.sock and ws.sock.connected:
                ws.send(json.dumps({"type": "Terminate"}))  # close the session
            ws.close()


    if __name__ == "__main__":
        main()
    ```
  </Tab>

  <Tab language="javascript" title="JavaScript">
    ```bash theme={null}
    npm install ws @picovoice/pvrecorder-node
    ```

    ```javascript expandable theme={null}
    const WebSocket = require("ws");
    const querystring = require("querystring");
    const { PvRecorder } = require("@picovoice/pvrecorder-node");

    const API_KEY = process.env.ASSEMBLYAI_API_KEY;
    const SAMPLE_RATE = 16000;
    const params = { speech_model: "universal-3-5-pro", sample_rate: SAMPLE_RATE };
    const endpoint = `wss://streaming.assemblyai.com/v3/ws?${querystring.stringify(params)}`;

    // PvRecorder ships prebuilt native binaries — no SoX or system audio install needed.
    const recorder = new PvRecorder(800, -1); // 800 samples ≈ 50 ms at 16 kHz
    const ws = new WebSocket(endpoint, { headers: { Authorization: API_KEY } });

    let running = true;
    process.on("SIGINT", () => {
      running = false;
    });

    ws.on("open", async () => {
      console.log("Connected. Speak into your microphone; press Ctrl+C to stop.");
      recorder.start();
      while (running && ws.readyState === WebSocket.OPEN) {
        const frame = await recorder.read();
        ws.send(Buffer.from(frame.buffer, frame.byteOffset, frame.byteLength));
      }
      recorder.stop();
      recorder.release();
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify({ type: "Terminate" })); // close the session
        ws.close();
      }
      process.exit();
    });

    ws.on("message", (message) => {
      const data = JSON.parse(message);
      if (data.type === "Turn") {
        process.stdout.write(data.end_of_turn ? `${data.transcript}\n` : `\r${data.transcript}`);
      }
    });

    ws.on("error", (error) => console.error("\nError:", error));
    ws.on("close", () => console.log("\nDisconnected."));
    ```
  </Tab>
</Tabs>

## Limits

* **Session length:** a streaming session auto-closes after 3 hours.
* **Audio:** mono 16-bit PCM; set `sample_rate` to match your source (16 kHz in these examples).
* **Rate limit:** new-session rate limits scale automatically with usage (default 5 for free accounts). Check yours on the [rate limits page](https://www.assemblyai.com/dashboard/rate-limits).

## Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

* [Streaming Speech-to-Text overview](/streaming)
* [Message sequence breakdown](/streaming/getting-started/transcribe-streaming-audio) — understand the `Begin`, `Turn`, and `Termination` events
* [WebSocket API reference](/api-reference/streaming-api/universal-streaming)

## Need some help?

If you get stuck, or have any other questions, we'd love to help you out. Contact our support team at [support@assemblyai.com](mailto:support@assemblyai.com) or create a [support ticket](https://www.assemblyai.com/contact/support).
