Introducing Universal-3 Pro | AssemblyAI

Overview

Universal-3 Pro is our most powerful Voice AI model yet, designed to capture the “hard stuff” that traditional ASR models struggle with.

Key Universal-3 Pro Capabilities:

Keyterm Prompting: Improve recognition of domain-specific terminology, rare words, and proper nouns
Prompting: Guide transcription style, formatting, and output characteristics

What prompts can do	Description
Verbatim transcription and disfluencies	Include um, uh, false starts, repetitions, stutters
Output style and formatting	Control punctuation, capitalization, number formatting
Context aware clues	Help with jargon, names, and domain expectations
Entity accuracy and spelling	Improve accuracy for proper nouns, brands, technical terms
Speaker attribution	Mark speaker turns and add labels
Audio event tags	Mark laughter, music, applause, background sounds
Code-switching and multilingual	Handle multilingual audio in same transcript
Numbers and measurements	Control how numbers, percentages, and measurements are formatted
Difficult audio handling	Guidance for unclear audio, overlapping speech, interruptions

The model out of the box outperforms all ASR models on the market on accuracy, especially as it pertains to entities and rare words. With prompting, you can get an entirely customized transcription output that rivals near-human-level transcription.

Example prompts and behavior

Prompts let you customize how your transcriptions appear, from polished output to complete verbatim. Here are three examples:

Sample prompt 1: Simple transcription

Transcribe this audio

Human-readable output without disfluencies or speech patterns
Strong overall WER but higher error rate on rare words
Best for general transcription prioritizing readability

Sample prompt 2: Enhanced accuracy

Transcribe accurately with attention to proper nouns, technical terms,
and natural speech patterns.

Adds contextual correctness for rare words/entities
Includes basic disfluencies and speech patterns
Best for domain-specific content (medical, legal, technical)

Sample prompt 3: Full verbatim with entity accuracy

Transcribe this audio with beautiful punctuation and formatting.
Include spoken filler words, hesitations, plus repetitions and false starts when clearly spoken.
Use standard spelling and the most contextually correct spelling of all words and names,
brands, drug names, medical terms, person names, and all proper nouns.
Transcribe in the original language mix (code-switching), preserving the words in the language they are spoken.

Captures all speech patterns, cross-talk, and background noise
Maximizes contextual and multilingual accuracy
Best for verbatim needs across varied transcript types

To fine-tune to your use case, see the Prompting section.

Not sure where to start? Try our Prompt Generator.

Quick start

This example shows how you can transcribe a pre-recorded audio file with our Universal-3 Pro model and print the transcript text to your terminal.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assembly.ai/sports_injuries.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"]
11 }
12 
13 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
14 
15 if response.status_code != 200:
16     print(f"Error: {response.status_code}, Response: {response.text}")
17     response.raise_for_status()
18 
19 transcript_response = response.json()
20 transcript_id = transcript_response["id"]
21 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
22 
23 while True:
24     transcript = requests.get(polling_endpoint, headers=headers).json()
25     if transcript["status"] == "completed":
26         print(transcript["text"])
27         break
28     elif transcript["status"] == "error":
29         raise RuntimeError(f"Transcription failed: {transcript['error']}")
30     else:
31         time.sleep(3)

Language support

Universal-3 Pro supports English, Spanish, Portuguese, French, German, and Italian. To access all 99 languages, use "speech_models": ["universal-3-pro", "universal-2"] as shown in the code example. Read more here.

Keyterms prompting

Keyterms prompting allows you to provide up to 1,000 words or phrases (maximum 6 words per phrase) using the keyterms_prompt parameter to improve transcription accuracy for those terms and related variations or contextually similar phrases.

Here is an example showing how you can use keyterms prompting to improve transcription accuracy for a name with distinctive spelling and formatting.

Without keyterms prompting:

Hi, this is Kelly Byrne Donahue

With keyterms prompting:

Hi, this is Kelly Byrne-Donoghue

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/keyterms_prompting.wav",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "keyterms_prompt": ["Kelly Byrne-Donoghue"]
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Prompting

Universal-3 Pro delivers great accuracy out of the box. To fine-tune transcription results to your use case, provide a prompt with up to 1,500 words of context in plain language. This helps the model consistently recognize domain-specific terminology, apply your preferred formatting conventions, handle code switching between languages, and better interpret ambiguous speech.

Verbatim transcription and disfluencies

Capture natural speech patterns exactly as spoken, including um, uh, false starts, repetitions, stutters. Add examples of the verbatim elements you want to transcribe in the prompt parameter to guide the model.

Without prompt:

Do you and Quentin still socialize when you come to Los Angeles, or is it like he's so used to having you here? No, no, no, we're friends. What do you do with him?

With prompt, the model better captures filler words like “uh” and false starts like “we, we, we’re friends”.

Do you and Quentin still socialize, uh, when you come to Los Angeles, or is it like he's so used to having you here? No, no, no, we, we, we're friends. What do you do with him?

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/verbatim.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "Produce a transcript suitable for conversational analysis. Every disfluency is meaningful data. Include: fillers (um, uh, er, ah, hmm, mhm, like, you know, I mean), repetitions (I I, the the), restarts (I was- I went), stutters (th-that, b-but, no-not), and informal speech (gonna, wanna, gotta)"
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Example prompts:

Include spoken filler words like "um," "uh," "you know," "like," plus repetitions
and false starts when clearly spoken.

Preserve all disfluencies exactly as spoken including verbal hesitations,
restarts, and self-corrections (um, uh, I—I mean).

Transcribe verbatim:
- Fillers: yes (um, uh, like, you know)
- Repetitions: yes (I I, the the the)
- Stutters: yes (th-that, b-but)
- False starts: yes (I was— I went)
- Colloquial: yes (gonna, wanna, gotta)

Output style and formatting

Control punctuation, capitalization, and readability without changing words.

Without prompt:

You got called because you were being loud and screaming. No, that's literally what my dispatch said. I don't give a fuck what your dispatch said. They lied. Okay, well, you need to calm down. I don't. Okay, yeah, calm down please. No, I don't. Yes, I'm Jesus Christ's daughter. I'm not doing this tonight with you. I'm not. I'm not. So you need to calm down.

With prompt, the model accurately captures the speaker’s emotional state through punctuation, adding exclamation marks during moments of yelling and emphasis.

You got called because you were being loud and screaming. No, I wasn't. That's literally what my dispatch said. I don't give a fuck what your dispatch said! They lied! Okay, well, you need to calm down. I don't! Okay, yeah, calm down, please. No, I don't! I'm Jesus Christ's daughter! I'm not doing this tonight with you. I'm not. I'm not. So you need to calm down.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/ouput_formatting.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "Add punctuation based on the speaker's tone and expressiveness. Use exclamation marks (!) when the speaker is yelling, excited, or emphatic. Use question marks (?) for questioning intonation. Apply standard punctuation (periods, commas) based on natural speech patterns and pauses."
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Example prompts:

Transcribe this audio with beautiful punctuation and formatting.

Use expressive punctuation to reflect emotion and prosody.

Use standard punctuation and sentence breaks for readability.

Context aware clues

Help with jargon, names, and domain expectations that are known from the audio file.

Without prompt:

I just want to move you along a bit further. Do you take any prescribed medicines? I know you've got diabetes and high blood pressure. I do. I take Ramipril. Okay. And I take Metformin, and there's another one that begins with G for the diabetes. Glicoside. Excellent. Okay. And do you know the dosage of the Ramipril? I think it's 5mg. 5mg, and do you take it regularly? Oh yeah, yeah. Good. Every evening. And no side effects with it? I did have an awful cough to start with, but that's stopped now. It's settled, good. Okay, and do you know the metformin, do you take it, do you know the dosage and how often? I take it 3 times a day, but I don't know what dose that is. Okay, we can check that out, we can follow that up, okay.

With prompt, adding ‘clinical history evaluation’ as a context clue corrects spelling of ‘Glicoside’ to ‘Glycoside’.

I just wanna move you along a bit further. Do you take any prescribed medicines? I know you've got diabetes and high blood pressure. I, I do. I take, um, I take Ramipril. Okay, mhm. And I take Metformin, and there's another one that begins with G for the diabetes. So glycosi— glycosi— glycoside. Excellent. Okay, and do you know the dosage of the Ramipril? Uh, I think it's 5mg. 5mg. And do you take it regularly? Oh yeah, yeah. Good. Every evening. And no side effects with it? Um, I did have an awful cough to start with, but that's, that's stopped now. It's settled, good. Okay, and do you know, erm, the metformin? Do you take it, do you know the dosage and how often? I take it 3 times a day, but I don't know what dose that is. Okay, we can check that out, we can follow that up, okay.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/nlp_prompting.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "Produce a transcript for a clinical history evaluation. It's important to capture medication and dosage accurately. Every disfluency is meaningful data. Include: fillers (um, uh, er, erm, ah, hmm, mhm, like, you know, I mean), repetitions (I I I, the the), restarts (I was- I went), stutters (th-that, b-but, no-not), and informal speech (gonna, wanna, gotta)"
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Example prompts:

Transcribe this audio. Context: a customer testimonial about contact center software.

Transcribe this audio. Context: a medical consultation discussing medications and symptoms.

Transcribe this audio. Context: a technical lecture about GPUs, CUDA, and inference.

Entity accuracy and spelling

Improve transcript accuracy for proper nouns, brands, technical terms, and domain vocabulary with prompting.

Without prompt:

Watch again closely. This is the potential game changer. The first responder NK cell killing cancer right before your eyes. If you give yourself Entiva, even in healthy volunteers, it dries up your first responders. It dries up your protectors. And that's why I said the power is within us.

With prompt, the model corrects the misrecognition of “Anktiva,” which would otherwise be transcribed as “Entiva”.

Watch again closely. This is the potential game changer. The first responder NK cell killing cancer right before your eyes. If you give yourself Anktiva, even in healthy volunteers, it dries up your first responders. It dries up your protectors. And that's why I said the power is within us.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/entity_accuracy.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "The speaker is discussing the cancer drug Anktiva (spelled A-N-K-T-I-V-A). When you hear what sounds like Entiva or similar pronunciations, transcribe it as Anktiva. This is the correct pharmaceutical name."
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Example prompts:

Use standard spelling and the most contextually correct spelling of all words
including names, brands, drug names, medical terms, and proper nouns.

Non-negotiable: Pharmaceutical accuracy required (omeprazole over omeprizole,
metformin over metforman).
Preserve acronyms and capitalization (EBITDA over ebitda, API over A.P.I.).

Caution: Over-instructing the model to follow examples can cause hallucinations when these examples are encountered.

Speaker attribution

Prompted speaker diarization identifies who said what in a conversation. It works especially well in cases where there are frequent interjections, such as quick acknowledgments or single-word responses, or when working with limited spoken audio, such as short-duration files.

Without prompt:

Speaker A: Five milligrams. And you take it regularly? Good. Every evening. And no side effects with it.

With prompt:

[Speaker:NURSE] 5mg. And do you take it regularly?
[Speaker:PATIENT] Oh yeah, yeah.
[Speaker:NURSE] Good.
[Speaker:PATIENT] I take it every evening.
[Speaker:NURSE] And no side effects with it?

Without prompting, it may appear that speaker B said everything. But with prompting, the model correctly identifies this as 5 separate speaker turns, capturing utterances as short as a single word, like “good”.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/speaker_diarization.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "speaker_labels": True,
12     "prompt": "Produce a transcript with every disfluency data. Additionally, label speakers with their respective roles. 1. Place [Speaker:role] at the start of each speaker turn. Example format: [Speaker:NURSE] Hello there. How can I help you today? [Speaker:PATIENT] I'm feeling unwell. I have a headache."
13 }
14 
15 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
16 
17 if response.status_code != 200:
18     print(f"Error: {response.status_code}, Response: {response.text}")
19     response.raise_for_status()
20 
21 transcript_response = response.json()
22 transcript_id = transcript_response["id"]
23 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
24 
25 while True:
26     transcript = requests.get(polling_endpoint, headers=headers).json()
27     if transcript["status"] == "completed":
28         for utterance in transcript["utterances"]:
29             print(f"Speaker {utterance['speaker']}: {utterance['text']}")
30         break
31     elif transcript["status"] == "error":
32         raise RuntimeError(f"Transcription failed: {transcript['error']}")
33     else:
34         time.sleep(3)

Audio event tags

Audio tags capture non-speech events like music, laughter, pauses, applause, background noise, and other sounds in your audio. Include examples of audio tags you want to transcribe in the prompt parameter to guide the model.

Without prompt:

Your call has been forwarded to an automatic voice message system. At the tone, please record your message. When you have finished recording, you may hang up or press 1 for more options.

With prompting, non-speech events like beeps are called out in the transcript.

Your call has been forwarded to an automatic voice message system. At the tone, please record your message. When you have finished recording, you may hang up or press 1 for more options. [beep]

Here are some examples of audio tags you can prompt for: [music], [laugher], [applause], [noise], [pause], [inaudible], [sigh], [gasp], [cheering], [sound], [screaming], [bell], [beep], [sound effect], [buzzer], and more.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/audio_tag.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "Produce a transcript suitable for conversational analysis. Every disfluency is meaningful data. Include: Tag sounds: [beep]"
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Code-switching and multilingual

Handle audio where speakers switch between languages.

Without prompt:

You literally lost your French? No, no, no. My French is there. Italian I've forgotten, but my French is still there and it will never leave. Okay, but would you need a French coach? Could you consider having me? Oh yeah, yeah, absolutely, absolutely. But for now, my French is there, fortunately.

With prompt, the model is able to preserve the speaker’s natural code switching between English and French, transcribing each language as spoken.

You literally lost your French? No, no, no. Mon français est là. L'italien, j'ai oublié, mais mon français est toujours là. Il partira jamais. Okay, but would you need a French coach? Could you consider having me? Oh yeah, yeah, absolutely, absolutely. Mais pour l'instant, le français est là, heureusement.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/code_switching_multilingual.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "The spoken language may change throughout the audio, transcribe in the original language mix (code-switching), preserving the words in the language they are spoken."
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Example prompts:

Transcribe in the original language mix (code-switching), preserving words in the language they are spoken.

Preserve natural code-switching between English and Spanish. Retain spoken language as-is (correct "I was hablando con mi manager").

Requires language_detection: true on your request. If a single language code is specified, the model will try to transcribe only that language.

Numbers and measurements

Control how numbers, percentages, and measurements are formatted.

Without prompt:

Commission has presented their communication, a hydrogen strategy for climate-neutral Europe, two weeks ago, which includes investments of between €180 billion and €400 billion.

With prompt:

Commission has presented their communication, a hydrogen strategy for climate-neutral Europe, 2 weeks ago, which includes investments of between €180,000,000,000 and €400,000,000,000.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/verbatim.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "Transcribe with numbers normalized to standard formats. For example, when you see $1 billion, convert to $1,000,000,000."
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Example prompts:

Convert spoken numbers to digits.

Use digits for numbers, percentages, and measurements.

Format financial figures with standard notation (Q3 revenue over third quarter revenue).

Difficult audio handling

Provide guidance for unclear audio, overlapping speech, and interruptions.

Without prompt:

I hope you got our card. Okay, nobody talk. We'll just wait for her to talk. Well, we just wanted to— damn it!

With prompt:

I hope you got our card. [CROSSTALK] Okay, nobody talk. We'll just wait for her to talk. Well, we just wanted to— [CROSSTALK] Damn it!

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/verbatim.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "When multiple speakers talk simultaneously, mark crosstalk segments."
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Example prompts:

If unintelligible, write (unclear).

Mark inaudible segments. Preserve overlapping speech and crosstalk.

Include pause markers where the speaker hesitates significantly.

Temperature parameter

Control the amount of randomness injected into the model’s response using the temperature parameter.

The temperature parameter accepts values from 0.0 to 1.0, with a default value of 0.0.

Choosing the right temperature value

The temperature parameter controls how deterministic or exploratory the model’s decoding is. Lower values (e.g., 0.0) make the model fully deterministic, which can be useful for strict reproducibility. Slightly higher values (e.g., 0.1) introduce a small amount of exploration.

Low non-zero temperatures often produce better transcription accuracy (lower WER)—in some cases up to ~5% relative improvement—by allowing the model to recover from early decoding mistakes, while still remaining highly stable. Higher values (e.g., > 0.3) increase randomness and may reduce accuracy.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/nlp_prompting.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "Produce a transcript for a clinical history evaluation. It's important to capture medication and dosage accurately. Every disfluency is meaningful data. Include: fillers (um, uh, er, erm, ah, hmm, mhm, like, you know, I mean), repetitions (I I I, the the), restarts (I was- I went), stutters (th-that, b-but, no-not), and informal speech (gonna, wanna, gotta)",
12     "temperature": 0.1
13 }
14 
15 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
16 
17 if response.status_code != 200:
18     print(f"Error: {response.status_code}, Response: {response.text}")
19     response.raise_for_status()
20 
21 transcript_response = response.json()
22 transcript_id = transcript_response["id"]
23 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
24 
25 while True:
26     transcript = requests.get(polling_endpoint, headers=headers).json()
27     if transcript["status"] == "completed":
28         print(transcript["text"])
29         break
30     elif transcript["status"] == "error":
31         raise RuntimeError(f"Transcription failed: {transcript['error']}")
32     else:
33         time.sleep(3)

99 languages coverage

With the speech_models parameter, you can list multiple speech models in priority order, allowing our system to automatically route your audio based on language support.

Model routing behavior: The system attempts to use the models in priority order falling back to the next model when needed. For example, with ["universal-3-pro", "universal-2"], the system will try to use universal-3-pro for languages it supports (English, Spanish, Portuguese, French, German, and Italian), and automatically fall back to Universal-2 for all other languages. This ensures you get the best performing transcription where available while maintaining the widest language coverage.

Best practices for prompt engineering

Check out this guide to learn even more about how to craft effective prompts for Universal-3 Pro speech transcription, which includes an AI prompt generator tool.