Supported regions
Supported regions
US & EU
Overview
AssemblyAI’s LLM Gateway is a unified, Open AI sdk compatible, interface that allows you to connect across multiple LLMs through a single API with access to automatic fallbacks. Use as a standalone service or make use of our easy integration with AssemblyAI speech-to-text products. We focus on text to text LLM providers with regional compliance, security, and ZDR. Additionally, we have optional but tight integration with other AssemblyAI products for a more seamless experience.Quickstart
- Python
- JavaScript
- Py OpenAI SDK
model parameter to use any of the available models listed in the Available models page.
Want to compare models side-by-side? Try the Model Comparison
Tool, a Lovable
application, to test different LLM models and see how they perform.
Non-standard LLM Enhancements
Here are some bonus features LLM Gateway provides over other routing options- Automatic retries
- 5xx responses will automatically be retried on our side by default
- configure with fallback_config
- Model Fallbacks
- Configure alternative models/providers in case a model isn’t working
- Even change the prompt and/or params to better suit a different model
- See docs
- Automatically apply LLM Gateway requests on each real-time STT turn for reduced E2E latency
- see docs
- Post Processing
- Automatically apply post processing to the LLM response
- Repair broken JSON in tool calls and structured output with
json-repair - see docs
- Inject Transcript Text
- Call LLM Gateway with a
transcript_idand we will handle fetching and adding the transcript text to the LLM message. - Useful if you don’t want to store the transcript.
- see docs
- Call LLM Gateway with a
Regional compliance
| Endpoint | Base URL |
|---|---|
| US (default) | https://llm-gateway.assemblyai.com/v1/chat/completions |
| EU | https://llm-gateway.eu.assemblyai.com/v1/chat/completions |
The LLM Gateway is available in both US and EU regions. Use the EU endpoint to
ensure your data stays within the European Union. Currently, Anthropic Claude
and most Google Gemini models are supported in the EU (except where otherwise noted). OpenAI models are only
available in the US region. See Cloud Endpoints and Data
Residency for more details.
- Basic Chat Completions - Simple request/response interactions
- Streamed Responses - Stream output as it’s generated (OpenAI models)
- Multi-turn Conversations - Maintain context across multiple exchanges
- Structured Outputs - Constrain responses to a specific JSON schema
- Tool/Function Calling - Enable models to execute custom functions
- Agentic Workflows - Multi-step reasoning with automatic tool chaining
- Unified Interface - One API for Claude, GPT, Gemini, Qwen, Kimi, and more
- Post-processing - Automatically repair malformed JSON responses with built-in JSON repair
Global model routing
If your request/needs do not require region locked routing for either latency or compliance, you can opt to set themodel_region to global in the api request body for a discount.
See model pricing for which models have global discounts.
Rate limits
LLM Gateway is rate limited per model, measured as requests within a 60-second window.| Account type | Rate limit (requests/min, per model) |
|---|---|
| Free | Not available |
| Paid | 30 |
Need a higher rate limit?If you need a higher rate limit, contact our support team.
Logging and troubleshooting
Every LLM Gateway response includes arequest_id field — a unique identifier for that request. Persist it (along with the model, the API region, and a timestamp) for every call you make, not just when something goes wrong. If you contact support@assemblyai.com about a specific request (latency spikes, unexpected output, rate-limit errors, content moderation surprises), this ID lets us locate the exact request in our logs immediately.
We recommend logging, at minimum:
request_idfrom the response body- The
modelparameter used - The API region (US:
llm-gateway.assemblyai.com, EU:llm-gateway.eu.assemblyai.com) - A timestamp for when the request was sent
- The full error response body when a non-2xx status code is returned
Next steps
- Basic Chat Completions - Learn how to send simple messages and receive responses
- Multi-turn Conversations - Maintain context across multiple exchanges
- Structured Outputs - Constrain model responses to follow a specific JSON schema
- Tool Calling - Enable models to execute custom functions
- Agentic Workflows - Build multi-step reasoning applications
- Post-processing - Automatically repair malformed JSON in model responses
- Prompt Caching - Save money on similar calls with prompt caching
The LLM Gateway API is separate from the Speech-to-Text and Speech
Understanding APIs. It provides a unified interface to work with large
language models across multiple providers.