Skip to main content
US & EU

Overview

AssemblyAI’s LLM Gateway is a unified, Open AI sdk compatible, interface that allows you to connect across multiple LLMs through a single API with access to automatic fallbacks. Use as a standalone service or make use of our easy integration with AssemblyAI speech-to-text products. We focus on text to text LLM providers with regional compliance, security, and ZDR. Additionally, we have optional but tight integration with other AssemblyAI products for a more seamless experience.

Quickstart

import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers = headers,
    json = {
        "model": "claude-sonnet-4-6",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "max_tokens": 1000
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])
Simply change the model parameter to use any of the available models listed in the Available models page.
Want to compare models side-by-side? Try the Model Comparison Tool, a Lovable application, to test different LLM models and see how they perform.

Non-standard LLM Enhancements

Here are some bonus features LLM Gateway provides over other routing options
  • Automatic retries
    • 5xx responses will automatically be retried on our side by default
    • configure with fallback_config
  • Model Fallbacks
    • Configure alternative models/providers in case a model isn’t working
    • Even change the prompt and/or params to better suit a different model
    • See docs
  • Automatically apply LLM Gateway requests on each real-time STT turn for reduced E2E latency
  • Post Processing
    • Automatically apply post processing to the LLM response
    • Repair broken JSON in tool calls and structured output with json-repair
    • see docs
  • Inject Transcript Text
    • Call LLM Gateway with a transcript_id and we will handle fetching and adding the transcript text to the LLM message.
    • Useful if you don’t want to store the transcript.
    • see docs

Regional compliance

EndpointBase URL
US (default)https://llm-gateway.assemblyai.com/v1/chat/completions
EUhttps://llm-gateway.eu.assemblyai.com/v1/chat/completions
The LLM Gateway is available in both US and EU regions. Use the EU endpoint to ensure your data stays within the European Union. Currently, Anthropic Claude and most Google Gemini models are supported in the EU (except where otherwise noted). OpenAI models are only available in the US region. See Cloud Endpoints and Data Residency for more details.
The LLM Gateway provides access to 25+ models across major AI providers with support for:
  • Basic Chat Completions - Simple request/response interactions
  • Streamed Responses - Stream output as it’s generated (OpenAI models)
  • Multi-turn Conversations - Maintain context across multiple exchanges
  • Structured Outputs - Constrain responses to a specific JSON schema
  • Tool/Function Calling - Enable models to execute custom functions
  • Agentic Workflows - Multi-step reasoning with automatic tool chaining
  • Unified Interface - One API for Claude, GPT, Gemini, Qwen, Kimi, and more
  • Post-processing - Automatically repair malformed JSON responses with built-in JSON repair

Global model routing

If your request/needs do not require region locked routing for either latency or compliance, you can opt to set the model_region to global in the api request body for a discount. See model pricing for which models have global discounts.

Rate limits

LLM Gateway is rate limited per model, measured as requests within a 60-second window.
Account typeRate limit (requests/min, per model)
FreeNot available
Paid30
Need a higher rate limit?If you need a higher rate limit, contact our support team.

Logging and troubleshooting

Every LLM Gateway response includes a request_id field — a unique identifier for that request. Persist it (along with the model, the API region, and a timestamp) for every call you make, not just when something goes wrong. If you contact support@assemblyai.com about a specific request (latency spikes, unexpected output, rate-limit errors, content moderation surprises), this ID lets us locate the exact request in our logs immediately. We recommend logging, at minimum:
  • request_id from the response body
  • The model parameter used
  • The API region (US: llm-gateway.assemblyai.com, EU: llm-gateway.eu.assemblyai.com)
  • A timestamp for when the request was sent
  • The full error response body when a non-2xx status code is returned
For details on debugging specific status codes (400/401/403/429/5xx) and what information to include when filing a support request, see the Troubleshooting page.

Next steps

The LLM Gateway API is separate from the Speech-to-Text and Speech Understanding APIs. It provides a unified interface to work with large language models across multiple providers.