Skip to main content
Universal-3.5 Pro is highly accurate out of the box, but for challenging audio like short clips with limited context, noisy environments, or audio with niche references, you can give the model information about your audio to improve transcription accuracy. There are two ways to give the model information about your audio:
  • Contextual prompting (prompt) — a natural-language description of what the audio is about: the domain, the scenario, or the full details of the conversation.
  • Keyterms prompting (keyterms_prompt) — an explicit list of terms you want the model to recognize accurately.

Contextual prompting

Use the prompt parameter to provide context about your audio — describe what is being transcribed, not how to transcribe it. Formatting and behavioral instructions are ignored. The model stays grounded in the audio, so irrelevant context won’t cause hallucinated words. For example, this is a 2-second clip from a League of Legends pro interview: Without prompt:
And so look who I've been a dear.
With prompt:
In solo queue, I ban Azir.
import requests
import time

base_url = "https://api.assemblyai.com"
headers = {"authorization": "<YOUR_API_KEY>"}

data = {
    "audio_url": "https://assembly.ai/prompt-8",
    "language_detection": True,
    "prompt": "League of Legends roles"
}

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

if response.status_code != 200:
    print(f"Error: {response.status_code}, Response: {response.text}")
    response.raise_for_status()

transcript_response = response.json()
transcript_id = transcript_response["id"]
polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"

while True:
    transcript = requests.get(polling_endpoint, headers=headers).json()
    if transcript["status"] == "completed":
        print(transcript["text"])
        break
    elif transcript["status"] == "error":
        raise RuntimeError(f"Transcription failed: {transcript['error']}")
    else:
        time.sleep(3)

Prompting guide

Contextual prompts work at three levels of specificity. Use the least specific level that covers your use case, and add detail when your audio contains uncommon names or terms the model can’t otherwise know.
LevelLengthWhat it containsExample
Domain2–5 wordsThe domain onlyMedical consultation call.
Scenario5–15 wordsWhat the conversation is aboutCardiology consultation about chest pain symptoms.
Detailed20–50 wordsFull description, including names, products, or identifiersCardiology consultation between Dr. Smith and an elderly patient regarding recurring chest pain, ECG results, and medication adjustment for hypertension.
Guidelines for writing contextual prompts:
  • Write plain, complete sentences that describe the audio
  • Keep it to one short block of text. Don’t pack lists of keywords into the contextual prompt

Keyterms prompting

Keyterms prompting allows you to provide up to 1,000 words or phrases (maximum 6 words per phrase) using the keyterms_prompt parameter to improve transcription accuracy for those terms and related variations or contextually similar phrases. Here is an example showing how you can use keyterms prompting to improve transcription accuracy for a name with distinctive spelling and formatting. Without keyterms prompting:
Hi, this is Kelly Byrne Donahue
With keyterms prompting:
Hi, this is Kelly Byrne-Donoghue
import requests
import time

base_url = "https://api.assemblyai.com"
headers = {"authorization": "<YOUR_API_KEY>"}

data = {
    "audio_url": "https://assemblyaiassets.com/audios/keyterms_prompting.wav",
    "language_detection": True,
    "keyterms_prompt": ["Kelly Byrne-Donoghue"]
}

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

if response.status_code != 200:
    print(f"Error: {response.status_code}, Response: {response.text}")
    response.raise_for_status()

transcript_response = response.json()
transcript_id = transcript_response["id"]
polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"

while True:
    transcript = requests.get(polling_endpoint, headers=headers).json()
    if transcript["status"] == "completed":
        print(transcript["text"])
        break
    elif transcript["status"] == "error":
        raise RuntimeError(f"Transcription failed: {transcript['error']}")
    else:
        time.sleep(3)
Keyword count limitsWhile we support up to 1000 key words and phrases, actual capacity may be lower due to internal tokenization and implementation constraints. Key points to remember:
  • Each word in a multi-word phrase counts towards the 1000 keyword limit
  • Capitalization affects capacity (uppercase tokens consume more than lowercase)
  • Longer words consume more capacity than shorter words
For optimal results, use shorter phrases when possible and be mindful of your total token count when approaching the keyword limit.

Need help?

If you’d like help building or optimizing a prompt for your audio, our team can help: open a live chat or email us via the widget in the bottom-right corner (contact info).