June 29, 2026

What languages does the Voice Agent API support?

If you're building a voice agent for a global audience, language support is a gating question—there's no point in a brilliant conversation flow your callers can't speak. Here's the direct answer for AssemblyAI's Voice Agent API, plus the part most people miss: getting the agent to understand names, products, and jargon correctly matters at least as much as the language list itself.

Kelsey Foster

Growth

Voice Agent API

Reviewed by

Table of contents

[Visible on live site]

If you're building a voice agent for a global audience, language support is a gating question — there's no point designing a brilliant conversation flow your callers can't speak. So here's the direct answer for AssemblyAI's Voice Agent API, plus the part most people miss: getting the agent to understand names, products, and jargon correctly matters at least as much as the language list.

The six supported languages

The Voice Agent API supports six languages:

English, Spanish, French, German, Italian, and Portuguese — with native code-switching across all six, so a single agent can handle a caller who mixes, say, Spanish and English mid-sentence without you configuring anything special.

That coverage comes from what the API is built on. The Voice Agent API runs the speech-to-text step on Universal 3.5 Pro Realtime, AssemblyAI's flagship real-time model . The whole pipeline — speech-to-text, LLM, and text-to-speech — comes through one WebSocket at a flat $4.50/hr, with roughly one second of end-to-end latency.

Why six and not 99? Because for a voice agent, getting the input right is everything: if the agent mishears the caller, the LLM responds to the wrong thing. Universal-3 Pro Streaming trains each of these languages deeply rather than spreading thin across a long tail, which is what delivers the entity accuracy and short-utterance handling ("yes," "no," account numbers) that voice agents live or die on.

Hear the Quality—Talk to a Live Agent

The fastest way to judge language and conversation quality is to have a conversation. Talk to the live demo agent—no signup required.

Talk to a live agent

Beyond the language list: custom terminology

Here's the part that decides whether your agent feels sharp or sloppy. A model can support a caller's language perfectly and still butcher your product names, plan tiers, drug names, or the caller's own name — because those words aren't in any general vocabulary. For a voice agent, mishearing "Is this about your Provigil prescription?" as "Provential" isn't a typo; it derails the whole turn.

The fix is keyterms prompting, and on the Voice Agent API it's included at no extra cost — you can load your domain terms and update them turn-by-turn, mid-conversation. Load your domain vocabulary (product names, account types, common customer names, industry acronyms) and the model is primed to hear them correctly. Mid-stream updates mean you can add context as the call progresses — for example, loading a returning caller's known products once you've identified them.

On the Voice Agent API you can also combine keyterms with a general prompt; the keyterms are automatically appended to your system prompt, so you get both broad instruction and precise vocabulary anchoring. For most agents, loading keyterms is the single highest-leverage accuracy step after picking the right language.

One WebSocket, Keyterms Included

One WebSocket, JSON, no SDK, keyterms prompting included. Read the API reference and have an agent running today—free to start.

What about other languages?

If your callers speak outside those six languages, you have options on the broader AssemblyAI platform. For real-time speech-to-text in 99+ languages, Whisper-Streaming serves as the interim model while native streaming support expands — useful if you're assembling your own pipeline and need a language the Voice Agent API doesn't yet cover. And for pre-recorded analysis of calls in any of 99+ languages, Universal-2 handles transcription after the fact. The six-language Voice Agent bundle is the turnkey path; the wider platform covers the long tail when you need it.

Language expansion for the streaming models is on the roadmap, so the supported set is worth re-checking as you plan.

The short version

The Voice Agent API speaks six languages — English, Spanish, French, German, Italian, and Portuguese — with native code-switching, built on the most accurate streaming model available and priced at a flat $4.50/hr. But the language list is only half the answer. Load your keyterms so the agent hears your product and customer names correctly, and you've covered the two things that actually determine whether callers feel understood. The best way to confirm it for your use case is to talk to the live demo, then load your own vocabulary and test it on the names that matter to you.

Frequently asked questions

What languages does the AssemblyAI Voice Agent API support?

The Voice Agent API supports six languages — English, Spanish, French, German, Italian, and Portuguese — with native code-switching across all of them, so one agent can handle callers who mix languages mid-sentence. Support is built on Universal-3 Pro Streaming, AssemblyAI's most accurate real-time model.

Does the Voice Agent API handle code-switching between languages?

Yes. It handles native code-switching across all six supported languages within a single conversation, so a caller can switch between, for example, Spanish and English without any special configuration. This is handled by the underlying Universal-3 Pro Streaming model.

How do I make a voice agent recognize product names and custom vocabulary?

Use keyterms prompting, which is included free on the Voice Agent API and can be updated turn-by-turn during a conversation. Load your product names, plan tiers, acronyms, and common customer names so the model is primed to transcribe them correctly. You can combine keyterms with a general system prompt — the keyterms are automatically appended.

What if my callers speak a language outside the six supported?

The six-language Voice Agent bundle is the turnkey option; the wider AssemblyAI platform covers additional languages.

How much does the Voice Agent API cost?

The Voice Agent API is a flat $4.50/hr that bundles speech-to-text, the LLM, and text-to-speech through a single WebSocket, billed by the minute. Keyterms prompting is included at no extra cost. There are no separate invoices for the three pipeline stages.

‍