Automatic Language Detection

Global Englishen
Australian Englishen_au
British Englishen_uk
US Englishen_us
Spanishes
Frenchfr
Germande
Italianit
Portuguesept
Dutchnl
Hindihi
Japaneseja
Chinesezh
Finnishfi
Koreanko
Polishpl
Russianru
Turkishtr
Ukrainianuk
Vietnamesevi
Afrikaansaf
Albaniansq
Amharicam
Arabicar
Armenianhy
Assameseas
Azerbaijaniaz
Bashkirba
Basqueeu
Belarusianbe
Bengalibn
Bosnianbs
Bretonbr
Bulgarianbg
Burmesemy
Catalanca
Croatianhr
Czechcs
Danishda
Estonianet
Faroesefo
Galiciangl
Georgianka
Greekel
Gujaratigu
Haitianht
Hausaha
Hawaiianhaw
Hebrewhe
Hungarianhu
Icelandicis
Indonesianid
Javanesejw
Kannadakn
Kazakhkk
Khmerkm
Laolo
Latinla
Latvianlv
Lingalaln
Lithuanianlt
Luxembourgishlb
Macedonianmk
Malagasymg
Malayms
Malayalamml
Maltesemt
Maorimi
Marathimr
Mongolianmn
Nepaline
Norwegianno
Norwegian Nynorsknn
Occitanoc
Panjabipa
Pashtops
Persianfa
Romanianro
Sanskritsa
Serbiansr
Shonasn
Sindhisd
Sinhalasi
Slovaksk
Sloveniansl
Somaliso
Sundanesesu
Swahilisw
Swedishsv
Tagalogtl
Tajiktg
Tamilta
Tatartt
Telugute
Thaith
Tibetanbo
Turkmentk
Urduur
Uzbekuz
Welshcy
Yiddishyi
Yorubayo

Universaluniversal

US & EU

Identify the dominant language spoken in an audio file and use it during the transcription. Enable it to detect any of the supported languages.

To reliably identify the dominant language, a file must contain at least 15 seconds of spoken audio. Results will be improved if there is at least 15-90 seconds of spoken audio in the file.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(language_detection=True)
9
10transcript = aai.Transcriber(config=config).transcribe(audio_file)
11
12print(transcript.text)
13print(transcript.json_response["language_code"])

Set a list of expected languages

If you’re confident the audio is in one of a few languages, provide that list via language_detection_options.expected_languages. Detection is restricted to these candidates and the model will choose the language with the highest confidence from this list. This can eliminate scenarios where Automatic Language Detection selects an unexpected language for transcription.

  • Use our language codes (e.g., "en", "es", "fr").
  • If expected_languages is not specified, it is set to ["all"] by default.
1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8options = aai.LanguageDetectionOptions(
9 expected_languages=["en", "es", "fr", "de"],
10 fallback_language="auto"
11)
12
13config = aai.TranscriptionConfig(language_detection=True, language_detection_options=options)
14
15transcript = aai.Transcriber(config=config).transcribe(audio_file)
16
17print(transcript.text)
18print(transcript.json_response["language_code"])

Choose a fallback language

Control what language transcription should fall back to when detection cannot confidently select a language from the expected_languages list.

  • Set language_detection_options.fallback_language to a specific language code (e.g., "en").
  • fallback_language must be one of the language codes in expected_languages or "auto".
  • When fallback_language is unspecified, it is set to "auto" by default. This tells our model to choose the fallback language from expected_languages with the highest confidence score.
1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8options = aai.LanguageDetectionOptions(
9 expected_languages=["en", "es", "fr", "de"],
10 fallback_language="auto"
11)
12
13config = aai.TranscriptionConfig(language_detection=True, language_detection_options=options)
14
15transcript = aai.Transcriber(config=config).transcribe(audio_file)
16
17print(transcript.text)
18print(transcript.json_response["language_code"])

Confidence score

If language detection is enabled, the API returns a confidence score for the detected language. The score ranges from 0.0 (low confidence) to 1.0 (high confidence).

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(language_detection=True)
9
10transcript = aai.Transcriber(config=config).transcribe(audio_file)
11
12print(transcript.text)
13print(transcript.json_response["language_confidence"])

Set a language confidence threshold

You can set the confidence threshold that must be reached if language detection is enabled. An error will be returned if the language confidence is below this threshold. Valid values are in the range [0,1] inclusive.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(language_detection=True, language_confidence_threshold=0.8)
9
10transcript = aai.Transcriber(config=config).transcribe(audio_file)
11
12if transcript.status == "error":
13 raise RuntimeError(f"Transcription failed: {transcript.error}")
14else:
15 print(transcript.json_response["language_confidence"])
16 print(transcript.text)