Specify Fallback Models
Overview
The Fallback feature lets you specify one or more backup models that the LLM Gateway will automatically try if your primary model fails. This ensures your application stays resilient without requiring complex retry logic on your end.
The LLM Gateway is available in both US and EU regions. Fallback behavior works the same way on both endpoints. See Cloud endpoints and data residency for more details.
Basic usage
To add a fallback, include a fallbacks array in your request. Each entry specifies an alternative model to use if the primary model is unavailable:
Python
JavaScript
If kimi-k2.5 fails, the LLM Gateway automatically retries the request using claude-sonnet-4-6.
You can chain up to two fallback models by setting fallback_config.depth to 2. The LLM Gateway tries each fallback in order until one succeeds.
Override fields per fallback
In the advanced case, you can override specific request fields for each fallback model. For example, you can change the messages or temperature for the fallback:
Python
JavaScript
Any field you don’t override in the fallback inherits the value from the original request. Only the fields you explicitly set in the fallback object are changed.
Retry behavior
If no fallbacks are set, the API automatically retries the LLM request once after 500ms. This is because fallback_config.retry defaults to true, providing a zero-config way to handle transient failures.
For more control over retries, set retry to false and implement your own exponential backoff:
Response behavior
When a fallback is used, the response looks exactly as if you had made the original request with the fallback model. The model field in the response reflects the fallback model that was used, and billing is charged only for that model.
API reference
Request parameters
Fallback object
Each object in the fallbacks array must include a model and can override any field available in the original request. For example:
Any field from the original request can be included in a fallback object. Fields not specified in the fallback inherit the values from the original request.
Next steps
- Basic chat completions - Send simple messages and receive responses
- Multi-turn conversations - Maintain context across multiple exchanges
- Tool calling - Enable models to execute custom functions
- Cloud endpoints and data residency - Learn about regional endpoint options