> ## Documentation Index
> Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Model Fallbacks

## Overview

The Fallback feature lets you specify one or more backup models that the LLM Gateway will automatically try if your primary model fails. This ensures your application stays resilient without requiring complex retry logic on your end.

<Note>
  The LLM Gateway is available in both US and EU regions. Fallback behavior works the same way on both endpoints. See [Cloud endpoints and data residency](/llm-gateway/cloud-endpoints-and-data-residency) for more details.
</Note>

## Basic usage

To add a fallback, include a `fallbacks` array in your request. Each entry specifies an alternative `model` to use if the primary model is unavailable:

<Tabs>
  <Tab title="Python" language="python">
    ```python expandable theme={null}
    import requests

    headers = {
      "authorization": "<YOUR_API_KEY>"
    }

    response = requests.post(
        "https://llm-gateway.assemblyai.com/v1/chat/completions",
        headers=headers,
        json={
            "model": "kimi-k2.5",
            "messages": [
                {"role": "user", "content": "Tell me a fairy tale."}
            ],
            "fallbacks": [
                {"model": "claude-sonnet-4-6"}
            ]
        }
    )

    result = response.json()
    print(result["choices"][0]["message"]["content"])
    ```
  </Tab>

  <Tab title="JavaScript" language="javascript">
    ```javascript theme={null}
    const response = await fetch(
      "https://llm-gateway.assemblyai.com/v1/chat/completions",
      {
        method: "POST",
        headers: {
          authorization: "<YOUR_API_KEY>",
          "content-type": "application/json",
        },
        body: JSON.stringify({
          model: "kimi-k2.5",
          messages: [{ role: "user", content: "Tell me a fairy tale." }],
          fallbacks: [
            { model: "claude-sonnet-4-6" }
          ],
        }),
      }
    );

    const result = await response.json();
    console.log(result.choices[0].message.content);
    ```
  </Tab>
</Tabs>

If `kimi-k2.5` fails, the LLM Gateway automatically retries the request using `claude-sonnet-4-6`.

<Note>
  You can chain up to two fallback models by setting `fallback_config.depth` to `2`. The LLM Gateway tries each fallback in order until one succeeds.
</Note>

## Override fields per fallback

In the advanced case, you can override specific request fields for each fallback model. For example, you can change the `messages` or `temperature` for the fallback:

<Tabs>
  <Tab title="Python" language="python">
    ```python expandable theme={null}
    import requests

    headers = {
      "authorization": "<YOUR_API_KEY>"
    }

    response = requests.post(
        "https://llm-gateway.assemblyai.com/v1/chat/completions",
        headers=headers,
        json={
            "model": "kimi-k2.5",
            "messages": [
                {"role": "user", "content": "Tell me a fairy tale."}
            ],
            "temperature": 0.2,
            "fallbacks": [
                {
                    "model": "claude-sonnet-4-6",
                    "messages": [
                        {"role": "user", "content": "Tell me a fairy tale, but be very concise."}
                    ],
                    "temperature": 0.4
                }
            ]
        }
    )

    result = response.json()
    print(result["choices"][0]["message"]["content"])
    ```
  </Tab>

  <Tab title="JavaScript" language="javascript">
    ```javascript expandable theme={null}
    const response = await fetch(
      "https://llm-gateway.assemblyai.com/v1/chat/completions",
      {
        method: "POST",
        headers: {
          authorization: "<YOUR_API_KEY>",
          "content-type": "application/json",
        },
        body: JSON.stringify({
          model: "kimi-k2.5",
          messages: [{ role: "user", content: "Tell me a fairy tale." }],
          temperature: 0.2,
          fallbacks: [
            {
              model: "claude-sonnet-4-6",
              messages: [
                { role: "user", content: "Tell me a fairy tale, but be very concise." },
              ],
              temperature: 0.4,
            },
          ],
        }),
      }
    );

    const result = await response.json();
    console.log(result.choices[0].message.content);
    ```
  </Tab>
</Tabs>

Any field you don't override in the fallback inherits the value from the original request. Only the fields you explicitly set in the fallback object are changed.

## Retry behavior

If no fallbacks are set, the API automatically retries the LLM request once after 500ms. This is because `fallback_config.retry` defaults to `true`, providing a zero-config way to handle transient failures.

For more control over retries, set `retry` to `false` and implement your own exponential backoff:

```json theme={null}
{
  "model": "kimi-k2.5",
  "messages": [{"role": "user", "content": "Tell me a fairy tale."}],
  "fallback_config": {
    "retry": false
  }
}
```

## Response behavior

When a fallback is used, the response looks exactly as if you had made the original request with the fallback model. The `model` field in the response reflects the fallback model that was used, and billing is charged only for that model.

## API reference

### Request parameters

| Key                     | Type    | Required? | Description                                                                                                                     |
| ----------------------- | ------- | --------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `model`                 | string  | Yes       | The primary model to use for completion. See [Available models](/llm-gateway/quickstart#available-models) for supported values. |
| `messages`              | array   | Yes       | An array of message objects representing the conversation history.                                                              |
| `fallbacks`             | array   | No        | An array of fallback objects. Each object must include a `model` and can override any field available in the original request.  |
| `fallback_config`       | object  | No        | Configuration for fallback behavior.                                                                                            |
| `fallback_config.retry` | boolean | No        | Whether to automatically retry the request once after 500ms on failure. Defaults to `true`.                                     |
| `fallback_config.depth` | number  | No        | Max fallbacks to traverse. Default 1, max 2.                                                                                    |

### Fallback object

Each object in the `fallbacks` array must include a `model` and can override any field available in the original request. For example:

| Key           | Type   | Required? | Description                                                                                                       |
| ------------- | ------ | --------- | ----------------------------------------------------------------------------------------------------------------- |
| `model`       | string | Yes       | The fallback model to use. See [Available models](/llm-gateway/quickstart#available-models) for supported values. |
| `messages`    | array  | No        | Override the messages for the fallback request.                                                                   |
| `temperature` | number | No        | Override the temperature for the fallback request.                                                                |
| `max_tokens`  | number | No        | Override the max tokens for the fallback request.                                                                 |

Any field from the original request can be included in a fallback object. Fields not specified in the fallback inherit the values from the original request.

## Next steps

* [Basic chat completions](/llm-gateway/chat-completions) - Send simple messages and receive responses
* [Multi-turn conversations](/llm-gateway/conversations) - Maintain context across multiple exchanges
* [Tool calling](/llm-gateway/tool-calling) - Enable models to execute custom functions
* [Cloud endpoints and data residency](/llm-gateway/cloud-endpoints-and-data-residency) - Learn about regional endpoint options
