Skip to main content
The AssemblyAI Self-Hosted Async Solution provides a secure transcription solution that can be deployed within your own infrastructure. This solution is designed for partners who need complete control over their data and infrastructure while maintaining high-quality speech-to-text capabilities.

Core principle

  • Complete data isolation: No audio data, transcript data, or personally identifiable information (PII) will ever be sent to AssemblyAI servers. Only usage metadata and licensing information is transmitted.

System requirements

Hardware requirements

  • CPU: 28 cores minimum
  • RAM: 128 GB minimum
  • GPU: NVIDIA GPU with CUDA Compute Capability 7.0 or higher is required
Compatible GPU models: T4, V100, A40, A100, L100
Older GPUs such as M10, K80, and GPUs based on Maxwell or Kepler architecture are NOT compatible with this solution.

Software requirements

  • Operating System: Linux
  • Container Runtime: Docker required
  • AWS Account: Required for pulling container images from our ECR registry

Prerequisites

  • Active enterprise contract with AssemblyAI
  • AWS account credentials for container registry access
  • Linux environment with Docker installed
  • NVIDIA Container Toolkit for GPU support

Setup and deployment

1. Docker runtime with GPU support

1.1 Verify NVIDIA drivers are installed:
nvidia-smi
1.2 Install NVIDIA Container Toolkit: Follow the NVIDIA Container Toolkit installation guide to set up GPU support for Docker. 1.3 Verify the Docker runtime has GPU access:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

2. Obtain credentials

AWS ECR Access: AssemblyAI will manually provision AWS account credentials for your team to pull container images from our private Amazon ECR registry. Contact your AssemblyAI representative to obtain these credentials.

3. AWS ECR authentication

Authenticate with AWS ECR using the provided credentials:
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 344839248844.dkr.ecr.us-west-2.amazonaws.com

4. Pull the Docker image

Pull the self-hosted ML container image:
docker pull 344839248844.dkr.ecr.us-west-2.amazonaws.com/self-hosted-ml-prod:release-v0.6

5. Obtain license file

AssemblyAI will provide a license file (license.jwt) that is required to run the container. The license file contains:
  • Expiration date
  • Usage limits
  • Customer identification
The license file provided for testing is valid for 30 days. For production deployments, contact AssemblyAI to obtain a production license.

6. Run the container

Start the self-hosted ML container with GPU support:
docker run --gpus all -p 8000:8000 \
  -e NVIDIA_DRIVER_CAPABILITIES=all \
  -v /absolute/local/path/to/license.jwt:/app/license.jwt \
  344839248844.dkr.ecr.us-west-2.amazonaws.com/self-hosted-ml-prod:release-v0.6
Parameters explained:
  • --gpus all: Enables GPU access for the container
  • -p 8000:8000: Maps port 8000 from the container to the host
  • -e NVIDIA_DRIVER_CAPABILITIES=all: Enables all NVIDIA driver capabilities
  • -v /absolute/local/path/to/license.jwt:/app/license.jwt: Mounts the license file into the container
Replace /absolute/local/path/to/license.jwt with the actual absolute path to your license file on the host system.

Using the API

Once the container is running, you can interact with it using HTTP requests.

Check container health

Verify that the container is ready to accept requests:
curl "http://localhost:8000/health"
A successful response indicates the container is ready to process transcription requests.

Transcribe an audio file

Submit an audio file for transcription:
curl -X POST "http://localhost:8000/predict" \
  -F "file=@/path/to/file.mp3" \
  -F 'payload={"language": "en"}'
Parameters:
  • file: The audio file to transcribe (supports common audio formats like MP3, WAV, M4A, etc.)
  • payload: JSON object containing transcription parameters
    • language: Language code for the audio (e.g., "en" for English)
Example response:
{
  "text": "This is the transcribed text from your audio file.",
  "words": [
    {
      "text": "This",
      "start": 0,
      "end": 200,
      "confidence": 0.98
    }
  ]
}

Supported languages

The self-hosted async solution supports multiple languages. Specify the language code in the payload parameter when making transcription requests. Common language codes:
  • en: English
  • es: Spanish
  • fr: French
  • de: German
  • it: Italian
  • pt: Portuguese
  • nl: Dutch
For a complete list of supported languages, contact your AssemblyAI representative.

Troubleshooting

Container fails to start

Issue: Container exits immediately after starting. Solution: Verify that:
  1. The license file path is correct and the file exists
  2. The license file is not expired
  3. GPU drivers are properly installed (nvidia-smi should work)
  4. NVIDIA Container Toolkit is installed

Health check fails

Issue: The /health endpoint returns an error or times out. Solution:
  1. Wait a few moments for the container to fully initialize
  2. Check container logs: docker logs <container_id>
  3. Verify GPU access: Ensure the container can access the GPU

Transcription request fails

Issue: The /predict endpoint returns an error. Solution:
  1. Verify the audio file format is supported
  2. Check that the language parameter is valid
  3. Ensure the file path in the curl command is correct
  4. Review container logs for detailed error messages

Support

For technical support or questions about the self-hosted async solution, contact your AssemblyAI representative or reach out to the AssemblyAI support team.

Simplified installation

AssemblyAI is working on packaging solutions and installation scripts to simplify the deployment process for customers. For the latest information on simplified installation options, contact your AssemblyAI representative.