Automatic Language Detection
Supported languages
en
en_au
en_uk
en_us
es
fr
de
it
pt
nl
hi
ja
zh
fi
ko
pl
ru
tr
uk
vi
af
sq
am
ar
hy
as
az
ba
eu
be
bn
bs
br
bg
my
ca
hr
cs
da
et
fo
gl
ka
el
gu
ht
ha
haw
he
hu
is
id
jw
kn
kk
km
lo
la
lv
ln
lt
lb
mk
mg
ms
ml
mt
mi
mr
mn
ne
no
nn
oc
pa
ps
fa
ro
sa
sr
sn
sd
si
sk
sl
so
su
sw
sv
tl
tg
ta
tt
te
th
bo
tk
ur
uz
cy
yi
yo
Supported models
universal
Supported regions
US & EU
Identify the dominant language spoken in an audio file and use it during the transcription. Enable it to detect any of the supported languages.
To reliably identify the dominant language, a file must contain at least 15 seconds of spoken audio. Results will be improved if there is at least 15-90 seconds of spoken audio in the file.
Set a list of expected languages
If you’re confident the audio is in one of a few languages, provide that list via language_detection_options.expected_languages
. Detection is restricted to these candidates and the model will choose the language with the highest confidence from this list. This can eliminate scenarios where Automatic Language Detection selects an unexpected language for transcription.
- Use our language codes (e.g.,
"en"
,"es"
,"fr"
). - If
expected_languages
is not specified, it is set to["all"]
by default.
Choose a fallback language
Control what language transcription should fall back to when detection cannot confidently select a language from the expected_languages
list.
- Set
language_detection_options.fallback_language
to a specific language code (e.g.,"en"
). fallback_language
must be one of the language codes inexpected_languages
or"auto"
.- When
fallback_language
is unspecified, it is set to"auto"
by default. This tells our model to choose the fallback language fromexpected_languages
with the highest confidence score.
Confidence score
If language detection is enabled, the API returns a confidence score for the detected language. The score ranges from 0.0 (low confidence) to 1.0 (high confidence).
Set a language confidence threshold
You can set the confidence threshold that must be reached if language detection is enabled. An error will be returned if the language confidence is below this threshold. Valid values are in the range [0,1] inclusive.