Translation | AssemblyAI

Supported languages

Global Englishen

Australian Englishen_au

British Englishen_uk

US Englishen_us

Spanishes

Frenchfr

Germande

Italianit

Portuguesept

Dutchnl

Hindihi

Japaneseja

Chinesezh

Finnishfi

Koreanko

Polishpl

Russianru

Turkishtr

Ukrainianuk

Vietnamesevi

Afrikaansaf

Albaniansq

Amharicam

Arabicar

Armenianhy

Assameseas

Azerbaijaniaz

Basqueeu

Belarusianbe

Bengalibn

Bosnianbs

Bulgarianbg

Catalanca

Croatianhr

Czechcs

Danishda

Estonianet

Galiciangl

Georgianka

Greekel

Gujaratigu

Haitianht

Hausaha

Hawaiianhaw

Hebrewhe

Hungarianhu

Icelandicis

Indonesianid

Javanesejw

Kannadakn

Kazakhkk

Laolo

Latinla

Latvianlv

Lithuanianlt

Luxembourgishlb

Macedonianmk

Malagasymg

Malayms

Malayalamml

Maltesemt

Maorimi

Marathimr

Mongolianmn

Nepaline

Norwegianno

Panjabipa

Pashtops

Persianfa

Romanianro

Serbiansr

Shonasn

Sindhisd

Sinhalasi

Slovaksk

Sloveniansl

Somaliso

Sundanesesu

Swahilisw

Swedishsv

Tagalogtl

Tajiktg

Tamilta

Telugute

Urduur

Uzbekuz

Welshcy

Yiddishyi

Yorubayo

Supported models

Universal-3-Prouniversal-3-pro

Universal-2universal-2

Supported regions

US only

Overview

The Translation feature automatically converts your transcribed audio content from one language to another, enabling you to reach global audiences without manual translation work. You can translate transcripts into over 100 languages with a single API request.

Key capabilities:

Translate to multiple target languages simultaneously
Choose between formal and informal translation styles
Translate during transcription or add translations to existing transcripts
Get full-text translations that preserve the original meaning and context
Get per-speaker translated utterances when using Speaker Labels

Common use cases:

Creating multilingual subtitles for video content
Translating customer support calls for international teams
Localizing podcast episodes for different markets
Making educational content accessible in multiple languages
Generating multilingual meeting summaries

Quickstart

There are two ways to use Translation:

Transcribe and translate in one request - Best when you’re starting a new transcription and want to automatically translate the transcript text as part of that process
Transcribe and translate in separate requests - Best when you already have text that you would like to translate or for more complicated workflows where you want to separate the transcription and translation tasks

Method 1: Transcribe and translate in one request

This method is ideal when you’re starting fresh and want both transcription and translation in a single workflow.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7   "authorization": "YOUR_API_KEY"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 audio_url = "https://assembly.ai/wildfires.mp3"
12 
13 # Configure transcription with translation
14 data = {
15   "audio_url": audio_url,
16   "speech_models": ["universal-3-pro", "universal-2"],
17   "language_detection": True,
18   "speaker_labels": True,  # Enable speaker labels
19   "speech_understanding": {
20     "request": {
21       "translation": {
22         "target_languages": ["es", "de"],  # Translate to Spanish and German
23         "formal": True  # Use formal language style
24       }
25     }
26   }
27 }
28 
29 # Submit transcription request
30 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
31 transcript_id = response.json()["id"]
32 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
33 
34 # Poll transcription results
35 while True:
36   transcript = requests.get(polling_endpoint, headers=headers).json()
37 
38   if transcript["status"] == "completed":
39     break
40 
41   elif transcript["status"] == "error":
42     raise RuntimeError(f"Transcription failed: {transcript['error']}")
43 
44   else:
45     time.sleep(3)
46 
47 # Access and display results
48 print("\n--- Original Transcript ---")
49 print(transcript['text'][:200] + "...\n")
50 
51 print("--- Translations ---")
52 for language_code, translated_text in transcript['translated_texts'].items():
53   print(f"{language_code.upper()}:")
54   print(translated_text[:200] + "...\n")

Method 2: Transcribe and translate in separate requests

This method is useful when you already have text that you would like to translate or for more complicated workflows where you want to separate the transcription and translation tasks.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7   "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 audio_url = "https://assembly.ai/wildfires.mp3"
12 
13 # Submit transcription request (without translation)
14 data = {
15   "audio_url": audio_url,
16   "speech_models": ["universal-3-pro", "universal-2"],
17   "language_detection": True,
18   "speaker_labels": True,
19 }
20 
21 # Transcribe file
22 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
23 transcript_id = response.json()["id"]
24 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
25 
26 # Poll for transcription completion
27 while True:
28   transcript = requests.get(polling_endpoint, headers=headers).json()
29 
30   if transcript["status"] == "completed":
31     print("Transcription completed!")
32     break
33 
34   elif transcript["status"] == "error":
35     raise RuntimeError(f"Transcription failed: {transcript['error']}")
36 
37   else:
38     time.sleep(3)
39 
40 # Add translation configuration to the completed transcript
41 understanding_body = {
42   "transcript_id": transcript_id,
43   "speech_understanding": {
44     "request": {
45       "translation": {
46       "target_languages": ["es", "de"],  # Translate to Spanish and German
47       "formal": True  # Use formal language style
48       }
49     }
50   }
51 }
52 
53 # Send to Speech Understanding API for translation
54 result = requests.post(
55   "https://llm-gateway.assemblyai.com/v1/understanding",
56   headers=headers,
57   json=understanding_body
58 ).json()
59 
60 # Access and display results
61 print("\n--- Original Transcript ---")
62 print(transcript['text'][:200] + "...\n")
63 
64 print("--- Translations ---")
65 for language_code, translated_text in result['translated_texts'].items():
66   print(f"{language_code.upper()}:")
67   print(translated_text[:200] + "...\n")

Expected output:

--- Original Transcript ---
Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US...
--- Translations ---
ES:
El humo de cientos de incendios forestales en Canadá está provocando alertas de calidad del aire...
DE:
Rauch von Hunderten von Waldbränden in Kanada löst in den gesamten USA Luftqualitätswarnungen aus...

Output format

The Translation API returns translations in the translated_texts key of the response. This key contains an object where each property is a language code corresponding to one of your target languages, and the value is the full translated text.

Example response structure:

1 {
2   "id": "735d90b6-2e8b-4748-b75d-d02b78eb7811",
3   "status": "completed",
4   "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
5   "translated_texts": {
6     "es": "El humo de cientos de incendios forestales en Canadá está provocando alertas de calidad del aire...",
7     "de": "Rauch von Hunderten von Waldbränden in Kanada löst in den gesamten USA Luftqualitätswarnungen aus..."
8   },
9   "speech_understanding": {
10     "request": {
11       "translation": {
12         "formal": true,
13         "target_languages": [
14           "es",
15           "de"
16         ],
17       }
18     },
19     "response": {
20       "translation": {
21         "status": "success"
22       }
23     }
24   },
25   "utterances": [
26     {
27       "speaker": "A",
28       "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
29       "confidence": 0.9815734,
30       "start": 240,
31       "end": 26560,
32       "words": [
33         {
34           "text": "Smoke",
35           "start": 240,
36           "end": 640,
37           "confidence": 0.90152997,
38           "speaker": "A"
39         },
40         // ... more words
41       ],
42       "translated_texts": {
43         "es": "El humo de cientos de incendios forestales en Canadá está provocando alertas de calidad del aire...",
44         "de": "Rauch von Hunderten von Waldbränden in Kanada löst in den gesamten USA Luftqualitätswarnungen aus..."
45       }
46     },
47     // ... more utterances
48   ],
49   ...
50 }

Translation with speaker labels

When you use Translation with Speaker Labels, you can get translated text for each individual utterance by setting match_original_utterance to true. This is useful for creating speaker-specific subtitles or analyzing conversations in multiple languages while preserving speaker attribution.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7   "authorization": "YOUR_API_KEY"
8 }
9 
10 audio_url = "https://assembly.ai/wildfires.mp3"
11 
12 # Configure transcription with translation and speaker labels
13 data = {
14   "audio_url": audio_url,
15   "speaker_labels": True,  # Enable speaker labels
16   # "language_detection": True,  # Enable language detection if you are processing files in a variety of languages
17   "speech_understanding": {
18     "request": {
19       "translation": {
20         "target_languages": ["es"],
21         "match_original_utterance": True,  # Get translated text per utterance
22         "formal": True
23       }
24     }
25   }
26 }
27 
28 # Submit transcription request
29 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
30 transcript_id = response.json()["id"]
31 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
32 
33 # Poll transcription results
34 while True:
35   transcript = requests.get(polling_endpoint, headers=headers).json()
36 
37   if transcript["status"] == "completed":
38     break
39 
40   elif transcript["status"] == "error":
41     raise RuntimeError(f"Transcription failed: {transcript['error']}")
42 
43   else:
44     time.sleep(3)
45 
46 # Access translated utterances
47 for utterance in transcript["utterances"]:
48   print(f"Speaker {utterance['speaker']}:")
49   print(f"  Original: {utterance['text'][:100]}...")
50   print(f"  Spanish: {utterance['translated_texts']['es'][:100]}...")
51   print()

Example response:

Each utterance in the utterances array includes a translated_texts object with the translation for that specific speaker’s utterance:

1 {
2   "utterances": [
3     {
4       "speaker": "A",
5       "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
6       "confidence": 0.9815734,
7       "start": 240,
8       "end": 26560,
9       "words": [...],
10       "translated_texts": {
11         "es": "El humo de cientos de incendios forestales en Canadá está activando alertas de calidad del aire..."
12       }
13     },
14     {
15       "speaker": "B",
16       "text": "Good morning.",
17       "confidence": 0.98217773,
18       "start": 28060,
19       "end": 28620,
20       "words": [...],
21       "translated_texts": {
22         "es": "Buenos días."
23       }
24     }
25   ]
26 }

API reference

Request

Method 1: Transcribe and translate in one request

When creating a new transcription, include the speech_understanding parameter directly in your transcription request:

$ curl -X POST \
>   "https://api.assemblyai.com/v2/transcript" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "audio_url": "https://assembly.ai/wildfires.mp3",
>     "speaker_labels": true,
>     "language_detection": true,
>     "speech_understanding": {
>       "request": {
>         "translation": {
>           "target_languages": ["es", "de"],
>           "formal": true
>         }
>       }
>     }
>   }'

Method 2: Add translation to existing transcripts

For existing transcripts, retrieve the completed transcript and send it to the Speech Understanding API:

$ # Step 1: Get the completed transcript
$ transcript=$(curl -s -X GET \
>   "https://api.assemblyai.com/v2/transcript/YOUR_TRANSCRIPT_ID" \
>   -H "Authorization: YOUR_API_KEY")
$ 
$ # Step 2: Add translation and send to Speech Understanding API
$ curl -X POST \
>   "https://llm-gateway.assemblyai.com/v1/understanding" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "transcript_id": "{transcript_id}",
>     "speech_understanding": {
>       "request": {
>         "translation": {
>           "target_languages": ["es", "de"],
>           "formal": true
>         }
>       }
>     }
>   }'

Key	Type	Required?	Description
`speech_understanding`	object	Yes	Container for speech understanding requests.
`speech_understanding.request`	object	Yes	The understanding request configuration.
`speech_understanding.request.translation`	object	Yes	Translation configuration.
`translation.target_languages`	array	Yes	Array of language codes to translate the transcript into. See the supported languages table for available language codes.
`translation.formal`	boolean	No	Whether to use formal language in translations. Defaults to `false`. When `true`, uses formal pronouns and grammatical forms.
`translation.match_original_utterance`	boolean	No	Whether to include translated texts for each utterance. Defaults to `false`. When `true`, returns a `translated_texts` key within each utterance in the `utterances` array. Requires `speaker_labels` to be set to `true` in the request.

Response

The Translation API returns your original transcript response with an additional translated_texts key containing the translations. When match_original_utterance is enabled with speaker_labels, each utterance in the utterances array will also include its own translated_texts key.

1 {
2   "id": "735d90b6-2e8b-4748-b75d-d02b78eb7811",
3   "status": "completed",
4   "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US...",
5   "translated_texts": {
6     "es": "El humo de cientos de incendios forestales en Canadá está provocando alertas de calidad del aire en todo Estados Unidos...",
7     "de": "Rauch von Hunderten von Waldbränden in Kanada löst in den gesamten USA Luftqualitätswarnungen aus..."
8   },
9   "speech_understanding": {
10     "request": {
11       "translation": {
12         "formal": true,
13         "target_languages": ["es", "de"]
14       }
15     },
16     "response": {
17       "translation": {
18         "status": "success"
19       }
20     }
21   }
22 }

Key	Type	Description
`translated_texts`	object	An object containing the translated texts, where each key is a language code and each value is the full translated transcript text.
`utterances[].translated_texts`	object	(When `match_original_utterance` is `true`) An object containing the translations for this specific utterance, with language codes as keys.
`speech_understanding`	object	Container for speech understanding request and response information.
`speech_understanding.request`	object	The original translation request configuration that was submitted.
`speech_understanding.request.translation`	object	The translation parameters that were used.
`speech_understanding.response`	object	The response information from the translation process.
`speech_understanding.response.translation`	object	Status information about the translation.
`speech_understanding.response.translation.status`	string	The status of the translation. Will be `"success"` when translation completes successfully.

Key differences from standard transcription

Field	Standard Transcription	With Translation
`translated_texts`	Not present	Object with language codes as keys and translated texts as values
`speech_understanding`	Not present	Object containing the translation request and response details

All other fields from the original transcript (text, words, utterances, confidence, etc.) remain unchanged.