Self-Hosted Async Transcription

The AssemblyAI Self-Hosted Async Solution provides a secure transcription solution that can be deployed within your own infrastructure. This solution is designed for partners who need complete control over their data and infrastructure while maintaining high-quality speech-to-text capabilities.

Core principle

Complete data isolation: No audio data, transcript data, or personally identifiable information (PII) will ever be sent to AssemblyAI servers. Only usage metadata and licensing information is transmitted.

System requirements

Hardware requirements

CPU: 28 cores minimum
RAM: 128 GB minimum
GPU: NVIDIA GPU with CUDA Compute Capability 7.0 or higher is required

Compatible GPU models: T4, V100, A40, A100, L100

Older GPUs such as M10, K80, and GPUs based on Maxwell or Kepler architecture are NOT compatible with this solution.

Software requirements

Operating System: Linux
Container Runtime: Docker required
AWS Account: Required for pulling container images from our ECR registry

Prerequisites

Active enterprise contract with AssemblyAI
AWS account credentials for container registry access
Linux environment with Docker installed
NVIDIA Container Toolkit for GPU support

Setup and deployment

1. Docker runtime with GPU support

1.1 Verify NVIDIA drivers are installed:

nvidia-smi

1.2 Install NVIDIA Container Toolkit: Follow the NVIDIA Container Toolkit installation guide to set up GPU support for Docker. 1.3 Verify the Docker runtime has GPU access:

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

2. Obtain credentials

AWS ECR Access: AssemblyAI will manually provision AWS account credentials for your team to pull container images from our private Amazon ECR registry. Contact your AssemblyAI representative to obtain these credentials.

3. AWS ECR authentication

Authenticate with AWS ECR using the provided credentials:

aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 344839248844.dkr.ecr.us-west-2.amazonaws.com

4. Pull the Docker image

Pull the self-hosted ML container image:

docker pull 344839248844.dkr.ecr.us-west-2.amazonaws.com/self-hosted-ml-prod:release-v0.6

5. Obtain license file

AssemblyAI will provide a license file (license.jwt) that is required to run the container. The license file contains:

Expiration date
Usage limits
Customer identification

The license file provided for testing is valid for 30 days. For production deployments, contact AssemblyAI to obtain a production license.

6. Run the container

Start the self-hosted ML container with GPU support:

docker run --gpus all -p 8000:8000 \
  -e NVIDIA_DRIVER_CAPABILITIES=all \
  -v /absolute/local/path/to/license.jwt:/app/license.jwt \
  344839248844.dkr.ecr.us-west-2.amazonaws.com/self-hosted-ml-prod:release-v0.6

Parameters explained:

--gpus all: Enables GPU access for the container
-p 8000:8000: Maps port 8000 from the container to the host
-e NVIDIA_DRIVER_CAPABILITIES=all: Enables all NVIDIA driver capabilities
-v /absolute/local/path/to/license.jwt:/app/license.jwt: Mounts the license file into the container

Replace /absolute/local/path/to/license.jwt with the actual absolute path to your license file on the host system.

Using the API

Once the container is running, you can interact with it using HTTP requests.

Check container health

Verify that the container is ready to accept requests:

curl "http://localhost:8000/health"

A successful response indicates the container is ready to process transcription requests.

Transcribe an audio file

Submit an audio file for transcription:

curl -X POST "http://localhost:8000/predict" \
  -F "file=@/path/to/file.mp3" \
  -F 'payload={"language": "en"}'

Parameters:

file: The audio file to transcribe (supports common audio formats like MP3, WAV, M4A, etc.)
payload: JSON object containing transcription parameters
- language: Language code for the audio (e.g., "en" for English)

Example response:

{
  "text": "This is the transcribed text from your audio file.",
  "words": [
    {
      "text": "This",
      "start": 0,
      "end": 200,
      "confidence": 0.98
    }
  ]
}

Supported languages

The self-hosted async solution supports multiple languages. Specify the language code in the payload parameter when making transcription requests. Common language codes:

en: English
es: Spanish
fr: French
de: German
it: Italian
pt: Portuguese
nl: Dutch

For a complete list of supported languages, contact your AssemblyAI representative.

Troubleshooting

Container fails to start

Issue: Container exits immediately after starting. Solution: Verify that:

The license file path is correct and the file exists
The license file is not expired
GPU drivers are properly installed (nvidia-smi should work)
NVIDIA Container Toolkit is installed

Health check fails

Issue: The /health endpoint returns an error or times out. Solution:

Wait a few moments for the container to fully initialize
Check container logs: docker logs <container_id>
Verify GPU access: Ensure the container can access the GPU

Transcription request fails

Issue: The /predict endpoint returns an error. Solution:

Verify the audio file format is supported
Check that the language parameter is valid
Ensure the file path in the curl command is correct
Review container logs for detailed error messages

Support

For technical support or questions about the self-hosted async solution, contact your AssemblyAI representative or reach out to the AssemblyAI support team.

Simplified installation

AssemblyAI is working on packaging solutions and installation scripts to simplify the deployment process for customers. For the latest information on simplified installation options, contact your AssemblyAI representative.

​Core principle

​System requirements

​Hardware requirements

​Software requirements

​Prerequisites

​Setup and deployment

​1. Docker runtime with GPU support

​2. Obtain credentials

​3. AWS ECR authentication

​4. Pull the Docker image

​5. Obtain license file

​6. Run the container

​Using the API

​Check container health

​Transcribe an audio file

​Supported languages

​Troubleshooting

​Container fails to start

​Health check fails

​Transcription request fails

​Support

​Simplified installation

Core principle

System requirements

Hardware requirements

Software requirements

Prerequisites

Setup and deployment

1. Docker runtime with GPU support

2. Obtain credentials

3. AWS ECR authentication

4. Pull the Docker image

5. Obtain license file

6. Run the container

Using the API

Check container health

Transcribe an audio file

Supported languages

Troubleshooting

Container fails to start

Health check fails

Transcription request fails

Support

Simplified installation