Skip to main content

Overview

By the end of this guide, you’ll have a working script that transcribes an audio file in a single SDK call. Build it with an AI coding agent, or write it yourself — both are below. Prefer to try it first? Transcribe audio without writing any code in the AssemblyAI Playground.

Before you begin

You’ll need:
  • An API key — grab one from your dashboard. Every example below reads it from an environment variable, so set it once:
    export ASSEMBLYAI_API_KEY=<your-key>
    
  • Python 3.8+ or Node.js 18+, depending on which SDK you use.
Building with an AI coding agent? Wire it up to AssemblyAI’s live docs (MCP server) and the AssemblyAI skill so it writes correct, up-to-date code instead of relying on stale training data:
claude mcp add --transport http --scope user assemblyai-docs https://mcp.assemblyai.com/docs
npx skills add AssemblyAI/assemblyai-skill --global
Then describe what you want to build. To get the same result as the steps below, paste:
Use the AssemblyAI Python SDK to transcribe https://assembly.ai/wildfires.mp3 and print the transcript text.

Transcribe your first file

Prefer to write it yourself? Follow these steps to transcribe our hosted sample file. The SDK uploads, submits, and polls for you in a single call.

Step 1: Install the SDK

pip install assemblyai

Step 2: Run your first transcription

Save this as transcribe.py (Python) or transcribe.js (JavaScript):
import os
import assemblyai as aai

aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]

transcript = aai.Transcriber().transcribe("https://assembly.ai/wildfires.mp3")
print(transcript.text)
Then run it — python transcribe.py or node transcribe.js. You’ll see the transcript printed:
Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US...
That’s the whole first call. From here you can add options — speaker labels, language detection, or a local file — see the complete example to combine them, or use the HTTP API directly if you’re not using an SDK.

Customize your request

The call above works with no extra configuration. Add capabilities by setting options on the same request — combine as many as you need (the complete example sets several at once).

Transcribe a local file

Pass a file path instead of a URL; the SDK uploads it for you.
transcript = aai.Transcriber().transcribe("./example.mp3")

Identify speakers

Enable Speaker Diarization to split the transcript by speaker. Each labeled segment (an utterance) has a speaker ID and its text.
config = aai.TranscriptionConfig(speaker_labels=True)
transcript = aai.Transcriber().transcribe("https://assembly.ai/wildfires.mp3", config=config)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Detect the language automatically

Use Automatic Language Detection to detect the dominant spoken language.
config = aai.TranscriptionConfig(language_detection=True)
transcript = aai.Transcriber().transcribe("https://assembly.ai/wildfires.mp3", config=config)

Complete example

Here’s the complete, runnable script — the call above plus options and error handling:
import os
import assemblyai as aai

aai.settings.base_url = "https://api.assemblyai.com"
aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]

# Use a publicly-accessible URL
audio_file = "https://assembly.ai/wildfires.mp3"

# Or use a local file:
# audio_file = "./example.mp3"

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"],
    language_detection=True,
    speaker_labels=True,
)

transcript = aai.Transcriber().transcribe(audio_file, config=config)

if transcript.status == aai.TranscriptStatus.error:
    raise RuntimeError(f"Transcription failed: {transcript.error}")

# Log transcript.id for every request (not just errors), with a timestamp and API region.
# It's required to fetch results, retry, or delete the transcript later, and it's the first
# thing support@assemblyai.com asks for. Delete: /pre-recorded-audio/delete-transcripts
# Troubleshooting: /pre-recorded-audio/guides/common_errors_and_solutions

print(f"\nFull Transcript:\n\n{transcript.text}")

# Optionally print speaker diarization results
# for utterance in transcript.utterances:
#     print(f"Speaker {utterance.speaker}: {utterance.text}")

What you get back

A completed transcript includes the full text plus metadata, and per-speaker utterances when you enable speaker_labels. The SDK exposes these as attributes (transcript.text, transcript.utterances[0].speaker); the raw API returns the same fields as JSON:
{
  "id": "106993b6-ac12-45d0-b74a-1bbd923e755d",
  "status": "completed",
  "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
  "language_code": "en",
  "audio_duration": 282,
  "confidence": 0.95,
  "utterances": [
    {
      "speaker": "A",
      "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
      "confidence": 0.97,
      "start": 100,
      "end": 26560,
      "words": [
        { "text": "Smoke", "start": 100, "end": 640, "confidence": 0.9, "speaker": "A" }
      ]
    }
  ]
}
start and end are in milliseconds. Persist id to fetch, retry, or delete the transcript later. See the transcript API reference for the complete field list.

Using the HTTP API directly

Not using an SDK? The same flow works over plain HTTP — authenticate with your key in the authorization header (no Bearer prefix), submit to POST /v2/transcript, then poll (repeatedly call GET /v2/transcript/{id}) until the status is completed. The SDKs above do all of this for you, including uploading local files and polling. All three examples read your key from the same ASSEMBLYAI_API_KEY environment variable you set in Before you begin. The cURL example also needs jq (brew install jq); the Python example needs the requests library (pip install requests); the JavaScript example needs Node.js 18+ (built-in fetch).
Submit the file, poll until the status is completed, then print the text. (The variable is named state because zsh reserves status.)
id=$(curl -s -X POST https://api.assemblyai.com/v2/transcript \
  -H "authorization: $ASSEMBLYAI_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "audio_url": "https://assembly.ai/wildfires.mp3",
    "speech_models": ["universal-3-pro", "universal-2"],
    "language_detection": true,
    "speaker_labels": true
  }' | jq -r .id)

while true; do
  state=$(curl -s https://api.assemblyai.com/v2/transcript/$id \
    -H "authorization: $ASSEMBLYAI_API_KEY" | jq -r .status)
  [ "$state" = "completed" ] && break
  [ "$state" = "error" ] && { echo "Transcription failed"; break; }
  sleep 3
done

curl -s https://api.assemblyai.com/v2/transcript/$id \
  -H "authorization: $ASSEMBLYAI_API_KEY" | jq -r .text
To transcribe a local file, upload it first and use the returned upload_url as the audio_url:
curl -s -X POST https://api.assemblyai.com/v2/upload \
  -H "authorization: $ASSEMBLYAI_API_KEY" \
  --data-binary @./example.mp3 | jq -r .upload_url

Limits

  • File size: up to 5 GB per request (/v2/transcript); local files uploaded via /v2/upload up to 2.2 GB.
  • Duration: 160 ms to 10 hours per file.
  • Formats: most common audio and video formats — submit your file as-is, no transcoding needed.
  • Concurrency: default 5 parallel jobs on free accounts, 200 on paid. Check yours on the rate limits page.

Next steps

Now that you have transcribed your first audio file: For more information, check out the full API reference documentation.

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.