> ## Documentation Index
> Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Self-Hosted Streaming

The **AssemblyAI Self-Hosted Streaming Solution** provides a secure, low-latency real-time transcription service that can be deployed within your own infrastructure. Audio, transcripts, and PII never leave your network — only license validation and usage metadata are transmitted back to AssemblyAI.

<Info>
  Self-hosted streaming requires an upfront commercial commitment of \$20,000. [Contact our sales team](https://www.assemblyai.com/contact/sales) to discuss your needs and learn more about our self-hosted offering.
</Info>

The deployment instructions, Compose files, nginx configuration, and example clients are maintained in the public [`streaming-self-hosting-stack`](https://github.com/AssemblyAI/streaming-self-hosting-stack) repository. This page covers what self-hosted streaming is, what you need to run it, and how the stack is shaped. **Go to the repo for the actual setup steps.**

## What you can self-host

Self-hosted streaming ships as **two separate stacks**. Each stack serves one model family, runs from its own Docker Compose file, and uses its own GPU. You pick the stack that matches the model you want to serve — they are not designed to run side by side.

| Stack                           | Model(s) served                            | Compose file                                                                                                                | Best for                                                                      |
| ------------------------------- | ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| **Universal Streaming**         | Universal Streaming English + Multilingual | [`docker-compose.yml`](https://github.com/AssemblyAI/streaming-self-hosting-stack/blob/main/docker-compose.yml)             | English and multilingual transcription workloads, telephony, captioning       |
| **Universal-3.5 Pro Streaming** | Universal-3.5 Pro                          | [`docker-compose.u3pro.yml`](https://github.com/AssemblyAI/streaming-self-hosting-stack/blob/main/docker-compose.u3pro.yml) | Voice agents — short utterances, low end-of-turn latency, continuous partials |

For product capabilities and accuracy details, see the [Universal Streaming](/streaming) and [Universal-3.5 Pro Streaming](/streaming/getting-started/transcribe-streaming-audio) overview pages.

## Core principle

* **Complete data isolation.** Audio, transcripts, and PII stay inside your infrastructure. The only outbound traffic is license validation and (for usage-based contracts) usage metadata to `https://usage-tracker.assemblyai.com`.

## System requirements

### Hardware

* **Universal Streaming.** NVIDIA T4 or newer per ASR container. We recommend at least 4 CPU and 16 GB RAM per ASR container.
* **Universal-3.5 Pro Streaming.** NVIDIA L4, A10, A100, L40S, H100, or equivalent with **at least 24 GB VRAM**. The container also bundles \~14 GB of model weights, so plan disk accordingly. T4 GPUs are **not** sufficient for U3.5 Pro.

### Software

* **Operating system.** Linux
* **Container runtime.** Docker and Docker Compose (v2 — the `docker compose` command, not `docker-compose`)
* **NVIDIA Container Toolkit.** Required for Docker to access the GPU
* **AWS credentials.** AssemblyAI provisions a scoped AWS access key for your team so you can pull container images from our private ECR registry

## Architecture

Both stacks share the same gateway, load balancer, and license proxy — they only differ in the ASR backend.

### Shared services (both stacks)

* **`streaming-api`** — Gateway WebSocket service that clients connect to. Handles session lifecycle, audio framing, and routing to the ASR backend.
* **`license-and-usage-proxy`** — Validates the license file at startup and reports usage metadata (for usage-based contracts).
* **`streaming-asr-lb`** — `nginx:alpine` load balancer that routes ASR gRPC requests to the right backend based on the `X-Model-Version` header.

### Universal Streaming stack

Adds two ASR backends:

* **`streaming-asr-english`** — English speech recognition.
* **`streaming-asr-multilang`** — Multilingual speech recognition.

### Universal-3.5 Pro Streaming stack

Adds a single ASR backend:

* **`streaming-asr-u3pro`** — Universal-3.5 Pro speech recognition. Available as of v0.6.0.

### Connection flow

```
WebSocket client ──→ streaming-api:8080
                          │
                          ├─ License validation ──→ license-and-usage-proxy:8080
                          │                              │
                          ├─ Usage reporting ────────────┴──→ https://usage-tracker.assemblyai.com
                          │                                  (usage-based billing only)
                          │
                          └─ ASR (gRPC) ──────────→ streaming-asr-lb:80
                                                       │
                                                       └─ Header-based routing (X-Model-Version):
                                                          ├── en-default → streaming-asr-english:50051 (gRPC)
                                                          ├── ml-default → streaming-asr-multilang:50051 (gRPC)
                                                          └── u3-pro     → streaming-asr-u3pro:50051 (gRPC)
```

The load balancer only forwards to backends that are actually deployed in the running stack — Universal Streaming routes for `en-default` and `ml-default`, U3.5 Pro routes for `u3-pro`.

## Getting started

Follow the [upstream repo's README](https://github.com/AssemblyAI/streaming-self-hosting-stack#setup-instructions) for the actual setup steps. At a high level:

1. **Get credentials and a license file** from your AssemblyAI representative — an AWS access key scoped to ECR, and a `license.jwt` file. The same license file works for both stacks.
2. **Install Docker, Docker Compose, and the NVIDIA Container Toolkit.** See the README's [setup section](https://github.com/AssemblyAI/streaming-self-hosting-stack#setup-instructions) for verification commands.
3. **Authenticate to ECR** with the provided AWS credentials.
4. **Pick a stack and configure `.env`** with the image references from the repo's [`.env.example`](https://github.com/AssemblyAI/streaming-self-hosting-stack/blob/main/.env.example).
5. **Start the stack** with `docker compose up -d` (Universal Streaming) or `docker compose -f docker-compose.u3pro.yml up -d` (Universal-3.5 Pro Streaming).

<Note>
  Universal Streaming ASR containers take roughly **2 minutes** to become ready and log `Ready to serve!`. The Universal-3.5 Pro Streaming ASR container takes roughly **5 minutes** and logs `U3Pro ASR Server ready!`. Health checks may report `unhealthy` during startup — that is expected.
</Note>

## Running a test client

The [repo](https://github.com/AssemblyAI/streaming-self-hosting-stack) ships an example Python client under [`streaming_example`](https://github.com/AssemblyAI/streaming-self-hosting-stack/tree/main/streaming_example) that streams a pre-recorded WAV file to the WebSocket endpoint. It supports all three speech models via the `--speech-model` flag:

* `universal-streaming-english` — Universal Streaming, English
* `universal-streaming-multilingual` — Universal Streaming, multilingual
* `universal-3-5-pro` — Universal-3.5 Pro Streaming

The client routes to the correct ASR backend automatically via the `X-Model-Version` header. Make sure the value you pass matches a backend deployed in the stack you started.

## Switching between stacks

The two stacks listen on the same ports (`streaming-api` on `8080`, ASR load balancer on the gRPC backend), so they cannot run simultaneously. To switch:

```bash theme={null}
# Stop the running stack first.
docker compose down                                # if Universal Streaming is running
docker compose -f docker-compose.u3pro.yml down    # if Universal-3.5 Pro Streaming is running

# Update .env for the new stack (image vars differ), then start.
docker compose up -d                               # Universal Streaming
docker compose -f docker-compose.u3pro.yml up -d   # Universal-3.5 Pro Streaming
```

## Production deployment

Per-service deployment strategy, resource sizing, autoscaling thresholds, health-check tuning, and the `license-and-usage-proxy` `/v1/status` endpoint reference all live in the repo's [Production Deployment Recommendations](https://github.com/AssemblyAI/streaming-self-hosting-stack#production-deployment-recommendations) section.

## Release notes and changelog

Release notes for the self-hosted stack — including per-version model improvements, API additions, and breaking changes — live in the repo's [README changelog](https://github.com/AssemblyAI/streaming-self-hosting-stack#changelog). Tagged releases are visible on the [Releases page](https://github.com/AssemblyAI/streaming-self-hosting-stack/releases).

## Support

For deployment questions, image access, license issues, or to report bugs, contact your AssemblyAI representative.