> ## Documentation Index
> Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

## Overview

**AssemblyAI's LLM Gateway** is a unified, Open AI sdk compatible, interface that allows you to connect across multiple LLMs through a single API with access to automatic fallbacks.
Use as a standalone service or make use of our easy integration with AssemblyAI speech-to-text products.

We focus on text to text LLM providers with regional compliance, security, and ZDR. Additionally, we have optional
but tight integration with other AssemblyAI products for a more seamless experience.

## Quickstart

<Tabs>
  <Tab title="Python" default>
    ```python {11} theme={null}
    import requests

    headers = {
      "authorization": "<YOUR_API_KEY>"
    }

    response = requests.post(
        "https://llm-gateway.assemblyai.com/v1/chat/completions",
        headers = headers,
        json = {
            "model": "claude-sonnet-4-6",
            "messages": [
                {"role": "user", "content": "What is the capital of France?"}
            ],
            "max_tokens": 1000
        }
    )

    result = response.json()
    print(result["choices"][0]["message"]["content"])
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript {10} theme={null}
    const response = await fetch(
      "https://llm-gateway.assemblyai.com/v1/chat/completions",
      {
        method: "POST",
        headers: {
          authorization: "<YOUR_API_KEY>",
          "content-type": "application/json",
        },
        body: JSON.stringify({
          model: "claude-sonnet-4-6",
          messages: [{ role: "user", content: "What is the capital of France?" }],
          max_tokens: 1000,
        }),
      }
    );

    const result = await response.json();
    console.log(result.choices[0].message.content);
    ```
  </Tab>

  <Tab title="Py OpenAI SDK">
    ```python {11} theme={null}
    from openai import OpenAI
    import openai.types.chat.chat_completion as types

    client = OpenAI(
        base_url="https://llm-gateway.assemblyai.com/v1",
        api_key="<YOUR_API_KEY>",
    )

    import json

    messages = [{"role": "user", "content": "What is the capital of france"}]
    response = client.chat.completions.create(model="claude-sonnet-4.6", messages=messages)
    print(response.choices[0].message.content)
    ```
  </Tab>
</Tabs>

Simply change the `model` parameter to use any of the available models listed in the [Available models](/llm-gateway/available-models) page.

<Note>
  Want to compare models side-by-side? Try the [Model Comparison
  Tool](https://assemblyai-model-comparison-llmgw.lovable.app/), a Lovable
  application, to test different LLM models and see how they perform.
</Note>

## Non-standard LLM Enhancements

Here are some bonus features LLM Gateway provides over other routing options

* Automatic retries
  * 5xx responses will automatically be retried on our side by default
  * configure with [fallback\_config](/api-reference/llm-gateway/create-chat-completion#body-fallback-config)
* Model Fallbacks
  * Configure alternative models/providers in case a model isn't working
  * Even change the prompt and/or params to better suit a different model
  * See [docs](/llm-gateway/fallback)
* Automatically apply LLM Gateway requests on each real-time STT turn for reduced E2E latency
  * see [docs](/guides/real_time_llm_gateway)
* Post Processing
  * Automatically apply post processing to the LLM response
  * Repair broken JSON in tool calls and structured output with `json-repair`
  * see [docs](/llm-gateway/structured-outputs#post-processing)
* Inject Transcript Text
  * Call LLM Gateway with a `transcript_id` and we will handle fetching and adding the transcript text to the LLM message.
  * Useful if you don't want to store the transcript.
  * see [docs](/llm-gateway/chat-completions#inject-a-transcript-by-id)

## Regional compliance

| Endpoint     | Base URL                                                    |
| ------------ | ----------------------------------------------------------- |
| US (default) | `https://llm-gateway.assemblyai.com/v1/chat/completions`    |
| EU           | `https://llm-gateway.eu.assemblyai.com/v1/chat/completions` |

<Note>
  The LLM Gateway is available in both US and EU regions. Use the EU endpoint to
  ensure your data stays within the European Union. Currently, Anthropic Claude
  and most Google Gemini models are supported in the EU (except where otherwise noted). OpenAI models are only
  available in the US region. See [Cloud Endpoints and Data
  Residency](/llm-gateway/cloud-endpoints-and-data-residency) for more details.
</Note>

The LLM Gateway provides access to 25+ models across major AI providers with support for:

* **Basic Chat Completions** - Simple request/response interactions
* **Streamed Responses** - Stream output as it's generated (OpenAI models)
* **Multi-turn Conversations** - Maintain context across multiple exchanges
* **Structured Outputs** - Constrain responses to a specific JSON schema
* **Tool/Function Calling** - Enable models to execute custom functions
* **Agentic Workflows** - Multi-step reasoning with automatic tool chaining
* **Unified Interface** - One API for Claude, GPT, Gemini, Qwen, Kimi, and more
* **Post-processing** - Automatically repair malformed JSON responses with built-in JSON repair

### Global model routing

If your request/needs do not require region locked routing for either latency or compliance, you can opt to set the
`model_region` to `global` in the api request body for a discount.

See model [pricing](https://www.assemblyai.com/pricing) for which models have global discounts.

## Rate limits

LLM Gateway is rate limited per model, measured as requests within a 60-second window. For account-specific limits and how to request a higher limit, see [Rate limits](/llm-gateway/rate-limits).

## Logging and troubleshooting

Every LLM Gateway response includes a `request_id` field — a unique identifier for that request. Persist it (along with the model, the API region, and a timestamp) for every call you make, not just when something goes wrong. If you contact [support@assemblyai.com](mailto:support@assemblyai.com) about a specific request (latency spikes, unexpected output, rate-limit errors, content moderation surprises), this ID lets us locate the exact request in our logs immediately.

We recommend logging, at minimum:

* `request_id` from the response body
* The `model` parameter used
* The API region (US: `llm-gateway.assemblyai.com`, EU: `llm-gateway.eu.assemblyai.com`)
* A timestamp for when the request was sent
* The full error response body when a non-2xx status code is returned

For details on debugging specific status codes (400/401/403/429/5xx) and what information to include when filing a support request, see the [Troubleshooting](/llm-gateway/troubleshooting) page.

## Next steps

* [Basic Chat Completions](/llm-gateway/chat-completions) - Learn how to send simple messages and receive responses
* [Multi-turn Conversations](/llm-gateway/conversations) - Maintain context across multiple exchanges
* [Structured Outputs](/llm-gateway/structured-outputs) - Constrain model responses to follow a specific JSON schema
* [Tool Calling](/llm-gateway/tool-calling) - Enable models to execute custom functions
* [Agentic Workflows](/llm-gateway/agentic-workflows) - Build multi-step reasoning applications
* [Post-processing](/llm-gateway/structured-outputs#post-processing) - Automatically repair malformed JSON in model responses
* [Prompt Caching](/llm-gateway/prompt-caching) - Save money on similar calls with prompt caching

<Note>
  The LLM Gateway API is separate from the Speech-to-Text and Speech
  Understanding APIs. It provides a unified interface to work with large
  language models across multiple providers.
</Note>
