Skip to main content
The Voice Agent API’s input (speech recognition) and output (speech synthesis) cover different sets of languages.

Input languages

The Voice Agent API uses Universal-3.5 Pro Streaming for speech recognition. It supports the following input languages with native code-switching:
LanguageFlag
English🇺🇸
Spanish🇪🇸
German🇩🇪
French🇫🇷
Portuguese🇵🇹
Italian🇮🇹
Turkish🇹🇷
Dutch🇳🇱
Swedish🇸🇪
Norwegian🇳🇴
Danish🇩🇰
Finnish🇫🇮
Hindi🇮🇳
Vietnamese🇻🇳
Arabic🇸🇦
Hebrew🇮🇱
Japanese🇯🇵
Chinese🇨🇳
For more on the underlying streaming model, see Universal-3.5 Pro Streaming.

Output languages

The Voice Agent API officially supports the following output languages, each backed by at least one voice with a matching primary or native accent:
LanguageFlagRecommended voices
English🇺🇸ivy, james, tyler, winter, bella, david, kyle, helen, martha, river, emma, victor, eleanor
French🇫🇷pierre
Italian🇮🇹giulia, luca
Spanish🇪🇸lucia, mateo, diego
Hindi🇮🇳arjun
Russian🇷🇺dmitri
See the Voices page for voice descriptions and audio samples.

Coming soon

Native-accent voices for the following languages are on the roadmap. If you’d like early access or want to register interest, contact support@assemblyai.com.
LanguageFlag
German🇩🇪
Portuguese🇵🇹
Turkish🇹🇷
Dutch🇳🇱
Swedish🇸🇪
Norwegian🇳🇴
Danish🇩🇰
Finnish🇫🇮
Vietnamese🇻🇳
Arabic🇸🇦
Hebrew🇮🇱
Japanese🇯🇵
Chinese🇨🇳