Insights & Use Cases
February 3, 2026

Prompt engineering for Universal-3 Pro: A practical guide

Learn how to use prompt engineering with Universal-3 Pro to control transcription behavior and improve domain-specific accuracy without custom model training.

Martin Schweiger
Senior API Support Engineer
Reviewed by
No items found.
Table of contents

Universal-3 Pro combines the reliability of traditional ASR with the flexibility of instruction-following models. Through natural language prompting, you can control transcription behavior without post-processing pipelines or custom model training.

This guide shows you what you can control through prompting and how to do it effectively.

What you can control through prompting

Our team extensively tested over 10,000+ prompt variations, andUniversal-3 Pro responds reliably to prompts in six key areas:

1. Context and domain adaptation

Describe your audio in plain language and the model optimizes accordingly. Simple context descriptions can improve Word Error Rate (WER) by approximately 2% on domain-specific content.

What works:

  • "This is medical audio"
  • "This is phone conversations"
  • "Short audio excerpt from corporate earnings call"

What doesn't work:

  • Overly verbose context descriptions
  • Vague instructions like "Use correct medical terminology"

Best practice: Keep context short and descriptive. One sentence is usually enough.

2. Verbatim-ness and disfluencies

Control whether you want clean, readable transcripts or full verbatim capture of every speech pattern.

For clean transcription:

Transcribe verbatim.

For capturing filler words and hesitations:

Transcribe verbatim. Include spoken filler words, hesitations, plus repetitions and false starts when clearly spoken.

For maximum verbatim (legal, compliance, linguistic analysis):

Transcribe verbatim:

- Fillers: yes (um, uh, like, you know)

- Repetitions: yes (I I, the the the)

- Stutters: yes (th-that, b-but)

- False starts: yes (I was- I went)

- Colloquial: yes (gonna, wanna, gotta)

3. Spelling and terminology accuracy

The model learns error patterns from examples, not just specific words. When you show 2-3 error patterns, it applies that correction across ALL similar terms in your transcript.

Effective approach:

Non-negotiable: Pharmaceutical accuracy required (omeprazole not omeprizole, metformin not metforman)

This teaches the model to fix vowel substitution errors in pharmaceutical terms. Testing shows up to 45% accuracy improvements when using explicit examples.

What works:

  • Concrete examples showing the error pattern: (correct not incorrect)
  • 2-3 examples that demonstrate the same type of error
  • Authoritative language: "Non-negotiable:", "Mandatory:", "Required:"

What doesn't work:

  • Generic instructions without examples
  • Polite language like "Please try to" or "If possible"
  • Listing terms without showing the error pattern

4. Multiple languages and code-switching

Universal-3 Pro handles six languages (English, Spanish, German, French, Italian, Portuguese) with native code-switching capability.

Essential instruction:

Transcribe in the original language mix (code-switching), preserving the words in the language they are spoken.

Critical: Without this instruction, the model will translate or omit non-English content. Always include code-switching language in your base prompt if working with multilingual audio.

5. Speaker information

Universal-3 Pro supports three types of speaker labeling through prompting:

Mark speaker turns.
Label speakers by gender (Male, Female)
Label speakers Doctor (interviewing a patient and giving medical advice) and Patient (describing their symptoms and asking questions about treatments).

Or for explicit roles:

Mark speakers by their role (Interviewer, Candidate).

6. Audio events and tags

Control how the model handles non-speech audio.

Basic audio tagging:

Tag sounds: [laughter], [silence], [noise]

Important: Audio event detection shows limited effectiveness compared to other capabilities. Use when you need to distinguish speech from non-speech audio, but don't expect highly detailed audio event capture.

The winning base prompt

After testing thousands of variations, this prompt consistently delivers strong results across all file types:

Transcribe this audio with beautiful punctuation and formatting. Include spoken filler words, hesitations, plus repetitions and false starts when clearly spoken. Use standard spelling and the most contextually correct spelling of all words and names, brands, drug names, medical terms, person names, and all proper nouns. Transcribe in the original language mix (code-switching), preserving the words in the language they are spoken.

Then append specific capabilities as needed from the sections above.

Temperature settings: Use 0.0

Default to temperature 0.0 for production. Our internal tests show that:

  • Temperature 0.0 performed best overall on real-world evaluation datasets
  • Temperature 0.1 showed slight improvements (positive effect) for diarization metrics specifically
  • Higher temperatures make results unreliable and increase WER

Exception: If you're trying to make the model deviate significantly from basic transcription, temperatures 0.1-0.3 may help certain prompts. But for standard transcription accuracy, stick with 0.0.

Best practices

Keep prompts concise:

  • 3-5 instructions maximum
  • 50-80 words is the sweet spot
  • Each instruction should be actionable and specific

Show, don't just tell:

  • Always include 2-3 concrete examples of error patterns
  • Format as: (correct not incorrect, correct2 not incorrect2)
  • Examples teach the model to recognize and fix entire categories of errors

Use authoritative language:

  • "Non-negotiable:", "Mandatory:", "Strict requirement:", "Required:"
  • Avoid polite, tentative phrasing

Build incrementally:

  • Start with the base prompt above
  • Add one capability at a time (speaker labels, audio tags, etc.)
  • Test each addition to verify it works as expected

Industry-specific examples

Medical consultations

Transcribe this audio with beautiful punctuation and formatting. Include spoken filler words, hesitations, plus repetitions and false starts when clearly spoken. Use standard spelling and the most contextually correct spelling of all words and names, brands, drug names, medical terms, person names, and all proper nouns. Transcribe in the original language mix (code-switching), preserving the words in the language they are spoken.
Non-negotiable: Pharmaceutical accuracy required (omeprazole not omeprizole, metformin not metforman, acetaminophen not acetaminophin)
Label speakers Doctor and Patient.

Sales calls and customer support

Transcribe this audio with beautiful punctuation and formatting. Include spoken filler words, hesitations, plus repetitions and false starts when clearly spoken. Use standard spelling and the most contextually correct spelling of all words and names, brands, drug names, medical terms, person names, and all proper nouns.
Mandatory: Company name accuracy (Salesforce not sales force, HubSpot not hub spot)
Mark speaker turns.

Legal depositions

Transcribe verbatim:
- Fillers: yes (um, uh, like, you know)
- Repetitions: yes (I I, the the the)
- Stutters: yes (th-that, b-but)
- False starts: yes (I was- I went)
Non-negotiable: Legal terminology accuracy (plaintiff not plantiff, subpoena not supena)

Testing your prompts

This is a rough overview of how we tested our prompts:

  1. Use our Playground - We've made a ton of improvements to our Playground to help you iterate quickly with your prompt
  2. Start with the base prompt - Use the default formula above
  3. Add one capability - Append speaker labels OR audio tags OR specific terminology
  4. Test on 2-3 sample files - Verify the behavior matches your expectations
  5. Listen to verify - The model often captures verbatim transcription that humans miss. Don't assume errors without listening to the audio
  6. Iterate based on patterns - If you see consistent errors, identify the error type (vowel substitution, sound-alike confusion, etc.) and add examples

Getting started

Universal-3 Pro eliminates the need for post-processing pipelines and custom model training. Start with the base prompt, add specific capabilities for your use case, and test with temperature 0.0.

For implementation details and API documentation, visit AssemblyAI's Universal-3 Pro documentation.

Test Universal-3 Pro in the playground

Test Universal-3 Pro for free in AssemblyAI's no-code playground

Open playground
Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
Universal-3-Pro