Supported languages
Supported languages
Supported models
Supported models
Supported regions
Supported regions
US & EU
Overview
The Translation feature automatically converts your transcribed audio content from one language to another, enabling you to reach global audiences without manual translation work. You can translate transcripts into over 100 languages with a single API request. Key capabilities:- Translate to multiple target languages simultaneously
- Choose between formal and informal translation styles
- Translate during transcription or add translations to existing transcripts
- Get full-text translations that preserve the original meaning and context
- Get per-speaker translated utterances when using Speaker Labels
- Creating multilingual subtitles for video content
- Translating customer support calls for international teams
- Localizing podcast episodes for different markets
- Making educational content accessible in multiple languages
- Generating multilingual meeting summaries
Quickstart
There are two ways to use Translation:- Transcribe and translate in one request - Best when you’re starting a new transcription and want to automatically translate the transcript text as part of that process
- Transcribe and translate in separate requests - Best when you already have text that you would like to translate or for more complicated workflows where you want to separate the transcription and translation tasks
Method 1: Transcribe and translate in one request
This method is ideal when you’re starting fresh and want both transcription and translation in a single workflow.- Python
- JavaScript
- Python SDK
- JavaScript SDK
Method 2: Transcribe and translate in separate requests
This method is useful when you already have text that you would like to translate or for more complicated workflows where you want to separate the transcription and translation tasks.- Python
- JavaScript
Output format
The Translation API returns translations in thetranslated_texts key of the response. This key contains an object where each property is a language code corresponding to one of your target languages, and the value is the full translated text.
Example response structure:
Translation with speaker labels
When you use Translation with Speaker Labels, you can get translated text for each individual utterance by settingmatch_original_utterance to true. This is useful for creating speaker-specific subtitles or analyzing conversations in multiple languages while preserving speaker attribution.
- Python
- JavaScript
utterances array includes a translated_texts object with the translation for that specific speaker’s utterance:
API reference
Request
Method 1: Transcribe and translate in one request
When creating a new transcription, include thespeech_understanding parameter directly in your transcription request:
Method 2: Add translation to existing transcripts
For existing transcripts, retrieve the completed transcript and send it to the Speech Understanding API:| Key | Type | Required? | Description |
|---|---|---|---|
speech_understanding | object | Yes | Container for speech understanding requests. |
speech_understanding.request | object | Yes | The understanding request configuration. |
speech_understanding.request.translation | object | Yes | Translation configuration. |
translation.target_languages | array | Yes | Array of language codes to translate the transcript into. See the supported languages table for available language codes. |
translation.formal | boolean | No | Whether to use formal language in translations. Defaults to false. When true, uses formal pronouns and grammatical forms. |
translation.match_original_utterance | boolean | No | Whether to include translated texts for each utterance. Defaults to false. When true, returns a translated_texts key within each utterance in the utterances array. Requires speaker_labels to be set to true in the request. |
Response
The Translation API returns your original transcript response with an additionaltranslated_texts key containing the translations. When match_original_utterance is enabled with speaker_labels, each utterance in the utterances array will also include its own translated_texts key.
| Key | Type | Description |
|---|---|---|
translated_texts | object | An object containing the translated texts, where each key is a language code and each value is the full translated transcript text. |
utterances[].translated_texts | object | (When match_original_utterance is true) An object containing the translations for this specific utterance, with language codes as keys. |
speech_understanding | object | Container for speech understanding request and response information. |
speech_understanding.request | object | The original translation request configuration that was submitted. |
speech_understanding.request.translation | object | The translation parameters that were used. |
speech_understanding.response | object | The response information from the translation process. |
speech_understanding.response.translation | object | Status information about the translation. |
speech_understanding.response.translation.status | string | The status of the translation. Will be "success" when translation completes successfully. |
Key differences from standard transcription
| Field | Standard Transcription | With Translation |
|---|---|---|
translated_texts | Not present | Object with language codes as keys and translated texts as values |
speech_understanding | Not present | Object containing the translation request and response details |
text, words, utterances, confidence, etc.) remain unchanged.