Language detection
This page explains how to configure language settings for your transcription requests. By default, automatic language detection is enabled and will identify the dominant language in your audio. You can also set the language manually if you know it in advance.
Automatic language detection
Automatic language detection is enabled by default. When enabled, the model identifies the dominant language in your audio and transcribes accordingly. Language detection requires at least 15 seconds of spoken audio to identify the language accurately.
Set a list of expected languages
If you’re confident the audio is in one of a few languages, provide that list via language_detection_options.expected_languages. Detection is restricted to these candidates and the model will choose the language with the highest confidence from this list. This can eliminate scenarios where Automatic Language Detection selects an unexpected language for transcription.
- Use the supported language codes (e.g.,
"en","es","fr"). - If
expected_languagesis not specified, it is set to["all"]by default.
Choose a fallback language
Control what language transcription should fall back to when detection cannot confidently select a language from the expected_languages list.
- Set
language_detection_options.fallback_languageto a specific language code (e.g.,"en"). fallback_languagemust be one of the language codes inexpected_languagesor"auto".- When
fallback_languageis unspecified, it is set to"auto"by default. This tells our model to choose the fallback language fromexpected_languageswith the highest confidence score.
Confidence score
If language detection is enabled, the API returns a confidence score for the detected language. The score ranges from 0.0 (low confidence) to 1.0 (high confidence).
Set a language confidence threshold
You can set the confidence threshold that must be reached if language detection is enabled. An error will be returned if the language confidence is below this threshold. Valid values are in the range [0,1] inclusive.
Set language manually
If you already know the dominant language, you can use the language_code parameter to specify the language of the speech in your audio file. If you don’t include a language_code parameter in your request, it defaults to en_us.
See the Supported languages section below for all supported languages and their codes.
Supported languages
AssemblyAI offers two different levels of speech-to-text models for pre-recorded audio: Universal-3-Pro and Universal-2. Check out the Models page of our documentation to learn more about our different models and how to choose the best one for your use case.
Universal-3-Pro
Universal-2
Breakdown of Universal-2 language support
High accuracy (≤ 10% WER)
English, Spanish, French, German, Indonesian, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Turkish, Ukrainian, Catalan
Good accuracy (>10% to ≤25% WER)
Arabic, Azerbaijani, Bulgarian, Bosnian, Mandarin Chinese, Czech, Danish, Greek, Estonian, Finnish, Filipino, Galician, Hindi, Croatian, Hungarian, Korean, Macedonian, Malay, Norwegian, Romanian, Slovak, Swedish, Swiss German, Thai, Urdu, Vietnamese
Moderate accuracy (>25% to ≤50% WER)
Afrikaans, Belarusian, Welsh, Persian (Farsi), Hebrew, Armenian, Icelandic, Kazakh, Lithuanian, Latvian, Māori, Marathi, Slovenian, Swahili, Tamil
Fair accuracy (>50% WER)
Amharic, Assamese, Bengali, Gujarati, Hausa, Javanese, Georgian, Khmer, Kannada, Luxembourgish, Lingala, Lao, Malayalam, Mongolian, Maltese, Burmese, Nepali, Occitan, Punjabi, Pashto, Sindhi, Shona, Somali, Serbian, Telugu, Tajik, Uzbek, Yoruba