How does the API handle files that contain spoken audio in multiple languages?
Each of our language models are trained on that specific language. Generally speaking, those models will only transcribe audio spoken in that specific language. There can be some edge cases where the language model is able to transcribe a few words that are in a different language. This can happen when the model may have learned those words due to them being present in the training data.
For instance, the Spanish model is trained with Spanish audio, but it is not uncommon that some English words are used when speaking Spanish. So when English words appear in the training data used for the Spanish model, the Spanish model can recognize and transcribe those words.
It’s important to note that a language model transcribing words from a different language would be an edge case and not behavior that should be reliably counted on. The expectation is that whatever language model the file is submitted to, will transcribe the spoken audio for that language and will ignore any other spoken languages in the audio.