What is LLM Gateway?
LLM gateway explained: compare features like routing, failover, security, and cost control so you can choose the right option for production AI apps today.



Building a Voice AI application means combining speech-to-text with language model processing—and that second step is where complexity accumulates fast. Which model handles summarization best? What happens when a provider rate-limits you mid-call? How do you A/B test GPT-5 against Claude without rewriting your integration? AssemblyAI's LLM Gateway solves all of this from inside the same platform you already use for transcription.
This guide explains what AssemblyAI's LLM Gateway is, how it works with streaming and async transcription, and why it was built specifically for Voice AI—not as a generic multi-model proxy.
What is AssemblyAI's LLM Gateway?
AssemblyAI's LLM Gateway is a unified API, built directly into the AssemblyAI platform, that lets you route your transcripts through leading LLMs from Anthropic, OpenAI, and Google—without managing separate API keys, accounts, or billing relationships.
Unlike general-purpose LLM proxies, LLM Gateway is designed around spoken data. It understands the structure of a transcript—speaker labels, word-level timestamps, overlapping turns—and preserves that context when routing to downstream models. You're not sending raw text; you're sending speech-aware context.
Here's what changes when you add AssemblyAI's LLM Gateway to your Voice AI stack:
When do you need AssemblyAI's LLM Gateway?
You need the LLM Gateway when post-processing your transcripts becomes a multi-model problem—or when you want to move fast without accumulating vendor overhead.
Use it when you need to:
- A/B test models without rewriting integrations. Want to compare Claude 4.5 Sonnet to GPT-5 on summarization quality? Change one parameter and compare results. No new API keys, no separate accounts.
- Apply the same LLM workflow to async and real-time transcripts. Whether you're processing a completed audio file or streaming a live call, the Gateway's interface stays consistent.
- Consolidate your Voice AI stack. AssemblyAI VP Alex Kroman framed it as eliminating "managing dozens of API keys, vendor relationships, sales meetings, multiple billing systems" and replacing it with operational simplicity.
- Build agentic workflows on top of transcripts. The LLM Gateway market supports tool/function calling and multi-step reasoning, so you can build agents that look up data, call APIs, or run custom logic—driven by what was said in the audio.
Supported models
LLM Gateway currently supports 15+ models across all three major providers, with new models added as soon as they launch.
Note: Anthropic is retiring Claude 3.0 Haiku on April 20, 2026. Switch to Claude 4.5 Haiku (claude-haiku-4-5-20251001) before that date to avoid interruptions.
Key features
Unified interface with one API key
You access every supported model through AssemblyAI's standard chat completions API. The schema stays consistent across providers, so switching from GPT to Claude means changing one parameter, not rewriting integration code. The Gateway is also fully compatible with the LiteLLM Python library, which makes it easy to plug into existing LLM tooling.
Speech-native context preservation
This is the core differentiator from generic LLM proxies. When you pass a transcript to the Gateway, it maintains speaker labels, timestamps, and conversation structure automatically. You don't need to flatten the output before prompting—the model receives speech-specific context that generic APIs often lose.
Full chat completion capabilities
LLM Gateway supports the full range of modern LLM capabilities:
- Basic Chat Completions — single prompt, single response
- Streamed Responses — output arrives as it generates (supported on OpenAI models)
- Multi-turn Conversations — maintain context across multiple exchanges
- Structured Outputs — constrain responses to a specific JSON schema
- Tool/Function Calling — enable models to execute custom functions
- Agentic Workflows — multi-step reasoning with automatic tool chaining
Unified billing
Usage from OpenAI, Anthropic, Google, and others is tracked in a single AssemblyAI account. There's no need to reconcile invoices from multiple vendors—charges appear per token at model-specific rates on AssemblyAI's pricing page.
Always up-to-date model support
New models are added to the Gateway as soon as providers release them. The team's response to the Gemini 3 Pro launch illustrates the pace: support shipped in the same week as the model's public release, with a comparison blog and demo video live within days.
Voice AI use cases
The LLM Gateway was built around how Voice AI teams actually use LLMs on transcripts. Here are the most common patterns:
Using LLM Gateway with streaming speech-to-text
LLM Gateway pairs directly with AssemblyAI's real-time Streaming Speech-to-Text API. A common pattern is to capture final transcript segments from the WebSocket stream and immediately route them to the Gateway for processing—keeping latency low across the full pipeline.
AssemblyAI's streaming STT returns final transcripts within a few hundred milliseconds (P50 latency ~300ms). Once a final segment arrives, you can pass it to the LLM Gateway for tasks like real-time summarization, sentiment tagging, or translation—without any additional infrastructure.
For non-streaming use cases—processing a completed call recording, for example—the Gateway accepts the full transcript text via a single chat completion request.
The 80/20 vision: LLM Gateway and speech understanding tasks
LLM Gateway is the engine for custom Voice AI logic, not a replacement for purpose-built capabilities—leveraging multi-model routing principles to optimize both cost and quality.
In practice: if you need a call summary, speaker identification, or PII redaction, those are handled by AssemblyAI's pre-built speech understanding endpoints—continuously optimized and always compatible with the latest models. If you need custom logic unique to your product, that's where LLM Gateway comes in.
Getting started
LLM Gateway is available to all AssemblyAI users. You use your existing AssemblyAI API key—no new accounts or vendor relationships required.
Key resources:
• LLM Gateway Overview — core documentation and model list
• Use LLM Gateway with Streaming STT — real-time integration guide
• Apply LLM Gateway to Audio Transcripts — async file processing
• Model Comparison Tool — compare LLM Gateway models side by side
Frequently Asked Questions
How is AssemblyAI's LLM Gateway different from LiteLLM or Portkey?
Generic LLM gateways route between providers for any use case. AssemblyAI's LLM Gateway is built specifically for Voice AI—it's integrated with your transcription pipeline, preserves speaker labels and timestamps automatically, and is billed through the same AssemblyAI account you already use. If your primary workflow is applying LLMs to spoken audio, you get that context-awareness without any extra setup.
Can I use LLM Gateway without AssemblyAI's transcription?
The Gateway's chat completions API accepts any text prompt, so you can route plain text through it. That said, it's designed to work best with AssemblyAI transcripts—the speech-native context handling adds the most value when you're passing structured transcript data.
How do I switch between models?
Change the model parameter in your API call. The schema stays identical across providers, so swapping from GPT-5 to Claude 4.5 Sonnet requires changing one line of code. No new authentication, no different request structure.
How does billing work?
All LLM usage routed through the Gateway appears in your AssemblyAI account. You're charged per token at model-specific rates listed on AssemblyAI's pricing page. There's no additional gateway fee on top of model costs.
What happens when new models are released?
AssemblyAI adds support for new models as soon as they're available. When Google released Gemini 3 Pro, the team shipped LLM Gateway support the same week. You don't need to make backend changes—new models appear in the supported list and are immediately accessible via the same API.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.




