Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Overview

This guide walks you through transcribing your first audio file with AssemblyAI. You will learn how to submit an audio file for transcription and retrieve the results using the AssemblyAI API.
Building a medical scribe or clinical documentation app? Check out the Medical Scribe guides for post-visit and real-time transcription workflows with Medical Mode, HIPAA-compliant configuration, and SOAP note generation.
When transcribing an audio file, there are three main things you will want to specify:
  1. The speech models you would like to use (required).
  2. The region you would like to use (optional).
  3. Other models you would like to use like Speaker Diarization or PII Redaction (optional).
speech_models is requiredYou must include the speech_models parameter in every transcription request. There is no default model for pre-recorded transcription. If you omit speech_models, the request will fail. See Model selection to learn about available models.
Recommended modelWe recommend Universal-3 Pro for pre-recorded audio transcription. It delivers the highest accuracy and fastest transcription out of the box, with optional prompting for when you need more control. For the broadest language coverage (99 languages), use ["universal-3-pro", "universal-2"] to automatically fall back to Universal-2 for unsupported languages.

Prerequisites

Before you begin, make sure you have:
  • An AssemblyAI API key (get one by signing up at assemblyai.com)
  • Python 3.6 or later installed
  • The requests library (pip install requests)

Step 1: Set up your API credentials

First, configure your API endpoint and authentication:
import requests
import time

base_url = "https://api.assemblyai.com"
headers = {"authorization": "YOUR_API_KEY"}
Replace YOUR_API_KEY with your actual AssemblyAI API key.
Need EU data residency?Use our EU endpoint by changing base_url to "https://api.eu.assemblyai.com".

Step 2: Specify your audio source

You can transcribe audio files in two ways:
Option A: Use a publicly accessible URL
audio_file = "https://assembly.ai/wildfires.mp3"
Option B: Upload a local fileIf your audio file is stored locally, upload it to AssemblyAI first:
with open("./example.mp3", "rb") as f:
    response = requests.post(base_url + "/v2/upload", headers=headers, data=f)

    if response.status_code != 200:
        print(f"Error: {response.status_code}, Response: {response.text}")
        response.raise_for_status()

    upload_json = response.json()
    audio_file = upload_json["upload_url"]

Step 3: Submit the transcription request

Create a request with your audio URL and desired configuration options:
data = {
    "audio_url": audio_file,
    "speech_models": ["universal-3-pro", "universal-2"],
    "language_detection": True,
    "speaker_labels": True
}

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

if response.status_code != 200:
    print(f"Error: {response.status_code}, Response: {response.text}")
    response.raise_for_status()

transcript_json = response.json()
transcript_id = transcript_json["id"]
This configuration:
Log the transcript ID for every request
The id field returned from POST /v2/transcript is the transcript ID. Persist it (along with a timestamp and the API region) for every transcription request, not just when you hit an error. The transcript ID is required to fetch results, retry, or delete the transcript later — and it’s the first thing support@assemblyai.com will ask for when troubleshooting a specific request. See Troubleshoot common errors for the full debugging flow.
Model Pricing
Pricing can vary based on the speech model used in the request.
If you already have an account with us, you can find your specific pricing on the Billing page of your dashboard. If you are a new customer, you can find general pricing information here.

Step 4: Poll for the transcription result

Transcription happens asynchronously. Poll the API until the transcription is complete:
polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"

while True:
    transcript = requests.get(polling_endpoint, headers=headers).json()

    if transcript["status"] == "completed":
        print(f"\nFull Transcript:\n\n{transcript['text']}")
        break
    elif transcript["status"] == "error":
        raise RuntimeError(f"Transcription failed: {transcript['error']}")
    else:
        time.sleep(3)
The polling loop checks the transcription status every 3 seconds and prints the full transcript once processing is complete.

Step 5: Access speaker diarization (optional)

If you enabled speaker labels, you can access the speaker-separated utterances:
for utterance in transcript['utterances']:
    print(f"Speaker {utterance['speaker']}: {utterance['text']}")

Complete example

Here is the full working code:
import requests
import time

base_url = "https://api.assemblyai.com"
headers = {"authorization": "YOUR_API_KEY"}

# Use a publicly-accessible URL
audio_file = "https://assembly.ai/wildfires.mp3"

# Or upload a local file:
# with open("./example.mp3", "rb") as f:
#     response = requests.post(base_url + "/v2/upload", headers=headers, data=f)
#     if response.status_code != 200:
#         print(f"Error: {response.status_code}, Response: {response.text}")
#         response.raise_for_status()
#     upload_json = response.json()
#     audio_file = upload_json["upload_url"]

data = {
    "audio_url": audio_file,
    "speech_models": ["universal-3-pro", "universal-2"],
    "language_detection": True,
    "speaker_labels": True
}

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

if response.status_code != 200:
    print(f"Error: {response.status_code}, Response: {response.text}")
    response.raise_for_status()

transcript_json = response.json()
transcript_id = transcript_json["id"]
polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"

while True:
    transcript = requests.get(polling_endpoint, headers=headers).json()
    if transcript["status"] == "completed":
        print(f"\nFull Transcript:\n\n{transcript['text']}")

        # Optionally print speaker diarization results
        # for utterance in transcript['utterances']:
        #     print(f"Speaker {utterance['speaker']}: {utterance['text']}")
        break
    elif transcript["status"] == "error":
        raise RuntimeError(f"Transcription failed: {transcript['error']}")
    else:
        time.sleep(3)

Next steps

Now that you have transcribed your first audio file: For more information, check out the full API reference documentation.