LLM Gateway Overview

US & EU

Overview

AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, Gemini, and more. You can use the LLM Gateway to build sophisticated AI applications through a single API.

EndpointBase URL
US (default)https://llm-gateway.assemblyai.com/v1/chat/completions
EUhttps://llm-gateway.eu.assemblyai.com/v1/chat/completions

The LLM Gateway is available in both US and EU regions. Use the EU endpoint to ensure your data stays within the European Union. Currently, Anthropic Claude and most Google Gemini models are supported in the EU (except where otherwise noted). OpenAI models are only available in the US region. See Cloud Endpoints and Data Residency for more details.

The LLM Gateway provides access to 25+ models across major AI providers with support for:

  • Basic Chat Completions - Simple request/response interactions
  • Streamed Responses - Stream output as it’s generated (OpenAI models)
  • Multi-turn Conversations - Maintain context across multiple exchanges
  • Structured Outputs - Constrain responses to a specific JSON schema
  • Tool/Function Calling - Enable models to execute custom functions
  • Agentic Workflows - Multi-step reasoning with automatic tool chaining
  • Unified Interface - One API for Claude, GPT, Gemini, Qwen, Kimi, and more
  • Post-processing - Automatically repair malformed JSON responses with built-in JSON repair

Available models

By quality (LMArena Score)

ModelProviderParameterLMArena ScoreLatency per 10,000 tokens
Claude Opus 4.6Anthropicclaude-opus-4-614987.4s
Claude Opus 4.7Anthropicclaude-opus-4-71491TBD
GPT-5.5OpenAIgpt-5.51475TBD
Gemini 3 Flash PreviewGooglegemini-3-flash-preview14744.2s
Claude Opus 4.5Anthropicclaude-opus-4-5-2025110114683.9s
Claude Sonnet 4.6Anthropicclaude-sonnet-4-614667.2s
Claude 4.5 SonnetAnthropicclaude-sonnet-4-5-2025092914535.6s
Gemini 2.5 ProGooglegemini-2.5-pro14484.0s
GPT-5.1OpenAIgpt-5.114392.7s
Gemini 3.1 Flash Lite PreviewGooglegemini-3.1-flash-lite-preview1438TBD
GPT-5.2OpenAIgpt-5.214371.6s
GPT-5OpenAIgpt-514344.3s
Kimi K2.5Moonshot AIkimi-k2.514321.2s
GPT-4.1OpenAIgpt-4.114131.8s
Claude 4 OpusAnthropicclaude-opus-4-20250514141213.6s
Gemini 2.5 FlashGooglegemini-2.5-flash14112.6s
Claude 4.5 HaikuAnthropicclaude-haiku-4-5-2025100114094.1s
Qwen3 Next 80B A3BAlibaba Cloudqwen3-next-80b-a3b14023.1s
GPT-5 miniOpenAIgpt-5-mini13903.8s
Claude 4 SonnetAnthropicclaude-sonnet-4-2025051413895.1s
Gemini 2.5 Flash-LiteGooglegemini-2.5-flash-lite13801.1s
gpt-oss-120bOpenAIgpt-oss-120b13531.4s
Qwen3 32BAlibaba Cloudqwen3-32B13473.7s
GPT-5 nanoOpenAIgpt-5-nano13373.2s
gpt-oss-20bOpenAIgpt-oss-20b13171.1s

By latency (per 10,000 tokens)

ModelProviderParameterLatency per 10,000 tokensLMArena Score
Gemini 2.5 Flash-LiteGooglegemini-2.5-flash-lite1.1s1380
gpt-oss-20bOpenAIgpt-oss-20b1.1s1317
Kimi K2.5Moonshot AIkimi-k2.51.2s1432
gpt-oss-120bOpenAIgpt-oss-120b1.4s1353
GPT-5.2OpenAIgpt-5.21.6s1437
GPT-4.1OpenAIgpt-4.11.8s1413
Gemini 2.5 FlashGooglegemini-2.5-flash2.6s1411
GPT-5.1OpenAIgpt-5.12.7s1439
Qwen3 Next 80B A3BAlibaba Cloudqwen3-next-80b-a3b3.1s1402
GPT-5 nanoOpenAIgpt-5-nano3.2s1337
Qwen3 32BAlibaba Cloudqwen3-32B3.7s1347
GPT-5 miniOpenAIgpt-5-mini3.8s1390
Claude Opus 4.5Anthropicclaude-opus-4-5-202511013.9s1468
Gemini 2.5 ProGooglegemini-2.5-pro4.0s1448
Claude 4.5 HaikuAnthropicclaude-haiku-4-5-202510014.1s1409
Gemini 3 Flash PreviewGooglegemini-3-flash-preview4.2s1474
GPT-5OpenAIgpt-54.3s1434
Claude 4 SonnetAnthropicclaude-sonnet-4-202505145.1s1389
Claude 4.5 SonnetAnthropicclaude-sonnet-4-5-202509295.6s1453
Claude Sonnet 4.6Anthropicclaude-sonnet-4-67.2s1466
Claude Opus 4.6Anthropicclaude-opus-4-67.4s1498
Claude 4 OpusAnthropicclaude-opus-4-2025051413.6s1412
Claude Opus 4.7Anthropicclaude-opus-4-7TBD1491
GPT-5.5OpenAIgpt-5.5TBD1475
Gemini 3.1 Flash Lite PreviewGooglegemini-3.1-flash-lite-previewTBD1438

By provider

Anthropic Claude

ModelParameterLMArena ScoreLatency per 10,000 tokens
Claude Opus 4.7claude-opus-4-71491TBD
Claude Opus 4.6claude-opus-4-614987.4s
Claude Sonnet 4.6claude-sonnet-4-614667.2s
Claude Opus 4.5claude-opus-4-5-2025110114683.9s
Claude 4.5 Sonnetclaude-sonnet-4-5-2025092914535.6s
Claude 4.5 Haikuclaude-haiku-4-5-2025100114094.1s
Claude 4 Opusclaude-opus-4-20250514141213.6s
Claude 4 Sonnetclaude-sonnet-4-2025051413895.1s

OpenAI GPT

ModelParameterLMArena ScoreLatency per 10,000 tokens
GPT-5.5gpt-5.51475TBD
GPT-5.2gpt-5.214371.6s
GPT-5.1gpt-5.114392.7s
GPT-5gpt-514344.3s
GPT-5 nanogpt-5-nano13373.2s
GPT-5 minigpt-5-mini13903.8s
GPT-4.1gpt-4.114131.8s
gpt-oss-120bgpt-oss-120b13531.4s
gpt-oss-20bgpt-oss-20b13171.1s

Google Gemini

ModelParameterLMArena ScoreLatency per 10,000 tokens
Gemini 3 Flash Previewgemini-3-flash-preview14744.2s
Gemini 3.1 Flash Lite Previewgemini-3.1-flash-lite-preview1438TBD
Gemini 2.5 Progemini-2.5-pro14484.0s
Gemini 2.5 Flashgemini-2.5-flash14112.6s
Gemini 2.5 Flash-Litegemini-2.5-flash-lite13801.1s

Gemini 3.1 Flash Lite Preview is currently available in the US region only.

Alibaba Cloud Qwen

ModelParameterLMArena ScoreLatency per 10,000 tokens
Qwen3 Next 80B A3Bqwen3-next-80b-a3b14023.1s
Qwen3 32Bqwen3-32B13473.7s

Moonshot AI Kimi

ModelParameterLMArena ScoreLatency per 10,000 tokens
Kimi K2.5kimi-k2.514321.2s

Claude Opus 4.5 and Claude Opus 4.6 currently support context windows under 200k tokens via the LLM Gateway.

For information on data retention and model training policies for each provider, see Data Retention and Model Training.

Head to our Playground to test out LLM Gateway without having to write any code!

Select a model

You can specify which model to use in your request by setting the model parameter. Here are examples showing how to use Claude 4.5 Sonnet:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers = headers,
10 json = {
11 "model": "claude-sonnet-4-6",
12 "messages": [
13 {"role": "user", "content": "What is the capital of France?"}
14 ],
15 "max_tokens": 1000
16 }
17)
18
19result = response.json()
20print(result["choices"][0]["message"]["content"])

Simply change the model parameter to use any of the available models listed in the Available models section above.

Want to compare models side-by-side? Try the Model Comparison Tool, a Lovable application, to test different LLM models and see how they perform.

Next steps

The LLM Gateway API is separate from the Speech-to-Text and Speech Understanding APIs. It provides a unified interface to work with large language models across multiple providers.