Skip to main content
Choose the endpoint that best fits your application’s requirements—whether that’s using the default US region or ensuring your data stays within the European Union.

Default Endpoint

The default endpoint (llm-gateway.assemblyai.com) processes your LLM Gateway requests in the US region. If you don’t specify a base URL, this is the endpoint used by default.

EU Data Residency

The EU endpoint (llm-gateway.eu.assemblyai.com) guarantees your data never leaves the European Union. This is designed for organizations with strict data residency and governance requirements—your request and response data will remain entirely within the EU.

Endpoints

EndpointBase URLDescription
US (default)https://llm-gateway.assemblyai.com/v1/chat/completionsData stays in the US
EUhttps://llm-gateway.eu.assemblyai.com/v1/chat/completionsData stays in the EU
Effective July 1, 2026, in-region LLM Gateway requests will increase by 10% as a direct pass-through of provider price increases, with no AssemblyAI upcharge. Opt into global routing to keep current pricing.

EU model availability

The EU endpoint currently supports Anthropic Claude and Google Gemini models. OpenAI models are only available through the US endpoint.
ProviderUSEU
Anthropic ClaudeYesYes
Google GeminiYesYes
OpenAI GPTYesNo
Alibaba Cloud QwenYesNo
Moonshot AI KimiYesNo
For a full list of available models, see the LLM Gateway Overview.

Which endpoint should I use?

  • No data residency requirements? Use the default endpoint. No configuration change is needed.
  • Need EU data residency? Use the EU endpoint to ensure your data stays within the European Union. Note that only Claude and Gemini models are available in the EU region.
  • No data residency, compliance, or latency needs, and want cheaper calls? Opt into global routing on top of the default endpoint to route requests to the provider’s global endpoints at lower cost.

How to use it

Update your request URL to your preferred endpoint. Select an endpoint tab below to see examples for each.
The US endpoint is the default. No configuration change is required.
import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

# No URL change needed — US is the default
response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers=headers,
    json={
        "model": "claude-sonnet-4-6",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "max_tokens": 1000
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

Global routing

Global routing is an opt-in option that routes your request to the provider’s global (non-region) endpoints for lower-cost processing. Set model_region to "global" in your request body to enable it. Omit the parameter for default in-region processing. "global" is the only valid value for model_region.

Who should use global routing?

Global routing is designed for customers who:
  • Do not have data residency or compliance requirements that tie processing to a specific region.
  • Do not have strict latency requirements that depend on regional proximity.
  • Want lower-cost calls by using the provider’s global endpoints.
Global routing is always opt-in. If you don’t set model_region, requests continue to be processed in-region (US or EU), so data residency and compliance remain the default behavior. If you have data residency, compliance, or latency needs, keep using the default in-region processing on the US or EU endpoint.

Availability

Global routing is live for Anthropic Claude models. Support for Google Gemini 3 series models is coming soon.

Usage tracking

Global-routed usage appears as a new Global region in the spend and usage dashboard, separate from US and EU usage.

How to use it

Include model_region: "global" in your request body alongside model and messages.
import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers=headers,
    json={
        "model": "claude-sonnet-4-6",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "model_region": "global",
        "max_tokens": 1000
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])