Transcribe an audio file
Learn how to transcribe and analyze an audio file.
Overview
By the end of this tutorial, you'll be able to:
- Transcribe an audio file.
- Enable Speaker Diarization to detect speakers in an audio file.
Here's the full sample code for what you'll build in this tutorial:
Step 1: Install the SDK
Step 2: Configure the SDK
In this step, you 'll create an SDK client and configure it to use your API key.
- 1
Browse to , and then click the text under Your API key to copy it.
- 2
Step 3: Submit audio for transcription
In this step, you'll submit the audio file for transcription and wait until it's completes. Transcribing an audio file takes somewhere between 15–30% of the audio duration.
- 1
Specify a URL to the audio you want to transcribe. The URL needs to be accessible from AssemblyAI's servers. For a list of supported formats, refer to the FAQ.
Local audio filesIf you already have an audio file on your computer, you can also specify a local path, for example
./5_common_sports_injuries.mp3
.YouTubeYouTube URLs are not supported. If you want to transcribe a YouTube video, you need to download the audio first.
- 2
To generate the transcript, pass the audio URL to
transcribe()
. This may take a minute while we're processing the audio. - 3
If the transcription failed, the
status
of the transcription will be set toerror
. To see why it failed you can print the value oferror
. - 4
Print the complete transcript.
- 5
Run the application and wait for it to finish.
You've successfully transcribed your first audio file. You can see all submitted transcription jobs in the .
Step 4: Enable additional AI models
You can extract even more insights from the audio by enabling any of our AI models using transcription options. In this step, you'll enable the Speaker diarization model to detect who said what.
- 1
- 2
In addition to the full transcript, you now have access to utterances from each speaker.
Many of the properties in the transcript object only become available after you enable the corresponding model. For more information, refer to the models under Speech-to-Text and Audio Intelligence.
Next steps
In this tutorial, you've learned how to generate a transcript for an audio file and how to extract speaker information by enabling the Speaker diarization model.
Want to learn more?
- For more ways to analyze your audio data, explore our Audio Intelligence models.
- If you want to transcribe audio in real-time, see Using real-time streaming.
- To search, summarize, and ask questions on your transcripts, see Processing audio with LLMs using LeMUR.
Need some help?
If you get stuck, or have any other questions, we'd love to help you out. Ask our support team in our Discord server.