AssemblyAI’s Automatic Language Detection

Build powerful multilingual speech applications with our advanced Automatic Language Detection capabilities.
  • Industry-leading accuracy
  • Easy implementation
  • Available across 99 languages
An illustration on a blue background demonstrating AssemblyAI's Automatic Language Detection.

Get started with less than 10 lines of code

Simply enable automatic language detection in your transcription request to automatically detect the language of your audio file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"

transcriber = aai.Transcriber()

audio_url = (
    "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3"
)

config = aai.TranscriptionConfig(
    language_detection=True,
    # you can set an optional confidence threshold that must be reached
    # when language detection is enabled
    language_confidence_threshold=0.4
)

transcript = transcriber.transcribe(audio_url, config)

print(transcript.text)
# When language detection is enabled, the API returns a confidence score
# for the detected language
print(transcript.json_response["language_confidence"])

One API — 99 languages

Get automatic language detection with industry-leading accuracy across 99 languages, and improve the performance of your products with reduced transcription errors.
A table showing AssemblyAI's superior ALD benchmarks against competitors.

Build confidently with the most accurate Automatic Language Detection

Transcribe global content without language barriers

Automate multilingual call center transcriptions

Enable accurate subtitling for global video content

Streamline international meeting documentation

Simplify podcast transcription in many languages

Power multilingual voice assistants effortlessly

Join over 200,000 developers building with AssemblyAI

START BUILDING WITH AI

Get started in minutes

1
2
3
4
5
6
import assemblyai as aai

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)

print(transcript)
{
  "id": "6rlr37h8f4-e310-4e23-bbf3-ea5f347dc684",
  "language_code": "en_us",
  "status": "completed",
  "text": "Runner's knee is a condition characterized by pain behind or around the kneecap...",
  "confidence": 0.98122,
  "audio_duration": 3200,
  "words": [
    { "text": "Runner's", "start": 0, "end": 550, "speaker": "A", "confidence": 0.98113 },
    { "text": "knee", "start": 580, "end": 1130, "speaker": "A", "confidence": 0.95417 }
  ]
}