Python Speech-to-Text with Punctuation, Casing, and Formatting
Learn how to transcribe audio and video files into text that contains punctuation, casing and formatting using the AssemblyAI Python SDK.



When you need to convert speech to text in Python, you'll find a lot of options. Many tutorials point to basic open-source libraries that are great for simple projects, but as industry analysis shows, they often fall short in production environments where accuracy, speed, and reliability are critical. Getting clean, formatted text with proper punctuation and casing is often a significant challenge that requires extra work.
This tutorial cuts through the noise. We'll show you how to get production-ready transcripts using the AssemblyAI Python SDK. You'll learn how to transcribe audio files and get back text that's already formatted with punctuation and casing, saving you significant post-processing time. We'll cover everything from setting up your environment to handling different audio sources and preparing your code for real-world applications.
Project Prerequisites
Python speech-to-text requires three core components: Python 3.8+, the AssemblyAI SDK, and an API key.
Here's what you'll need:
- Python 3.8 or newer
- The AssemblyAI Python SDK
- An AssemblyAI API key, which can be copied from the AssemblyAI dashboard
All code in this blog post is available open source under the MIT license on GitHub under the transcribe-english-punctuation directory of the assemblyai-examples Git repository. You may therefore use the source code as you desire for your own projects.
Python Development Environment Configuration
Change into the directory where you keep your Python virtual environments. Create a new virtualenv for this project using the following command:
python -m venv transcriber
Activate the virtual environment with the activation script on macOS or Linux:
source ./transcriber/bin/activate
On Windows, the virtual environment can be activated with the activate.bat script:
.\transcriber\Scripts\activate.bat
Install the AssemblyAI Python SDK package:
pip install assemblyai
Log into your AssemblyAI account, or sign up for a new account if you don't already have one. Find and click the "Copy token" button on the dashboard.
Switch back to your console and paste the API token into the string value of the environment variable for ASSEMBLYAI_API_KEY. If you are on Windows, you can set an environment variable through the Windows settings. If you are on MacOS/Linux, run this command to set the ASSEMBLYAI_API_KEY environment variable:
export ASSEMBLYAI_API_KEY='' # paste this value from assemblyai.com/dashboard
Audio Input Options for Python Speech-to-Text
Before we write our transcription script, it's important to know what kind of audio sources you can work with. The AssemblyAI Python SDK is flexible and supports several common scenarios that developers face.
- Local audio files: You can directly upload and transcribe audio files (like MP3, WAV, M4A, and more) from your local machine.
- Remote audio files: If your file is hosted online, you can provide a public URL for transcription, which is the method we'll use in our main example.
- Real-time audio streams: For applications that need immediate transcription, like live captioning or voice command systems, you can use our streaming transcription model. While this tutorial focuses on file-based transcription, our documentation provides clear guidance for real-time use cases.
This flexibility means you can use the same core logic whether you're building an asynchronous file processing pipeline or an interactive voice application.
Transcription Python Code
Our dependencies are installed and the environment variable is set, so next we will write the Python code to handle the transcription.
Our transcription Python script will be compact given that we can use AssemblyAI Python SDK, which abstracts calling the AssemblyAI API and reduces the amount of required boilerplate code. Create a new file named transcriber.py and copy in the following code along with the comments for each line.
import assemblyai as aai
# The URL of the file we want to transcribe
URL = "https://talkpython.fm/episodes/download/356/tips-for-ml-ai-startups.mp3"
# The file to write the results to, or leave empty to print to console
OUTPUT_FILENAME = "podcast.txt"
# Instantiate the AssemblyAI Python SDK.
# It will automatically use the ASSEMBLYAI_API_KEY environment variable.
# You can also set it manually with: aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()
# Call the API. This will block until the API call finishes.
# Punctuation and formatting are enabled by default.
transcript = transcriber.transcribe(URL)
# Write to a file or print to console if no OUTPUT_FILENAME is set
if OUTPUT_FILENAME:
with open(OUTPUT_FILENAME, "w") as file:
file.write(transcript.text)
else:
print(transcript.text)
The code imports assemblyai as aai and sets two key variables:
URL: The audio file location (remote URL or local path)OUTPUT_FILENAME: Where to save results (optional—prints to console if empty)
We're transcribing a Talk Python episode as our example audio source.
The code is simple because AssemblyAI enables key formatting features like automatic punctuation and casing by default. You don't need to pass any special configuration to get a production-ready transcript.
The Transcriber object handles API authentication and requests.
API key priority:
- Automatic: Reads from
ASSEMBLYAI_API_KEYenvironment variable - Manual: Set via
aai.settings.api_key = "your_key"
The transcribe() method processes your audio synchronously—blocking until complete.
For asynchronous processing, use webhooks instead.
Output handling: saves to file if OUTPUT_FILENAME is set, otherwise prints to console.
Running the Transcription Code
Ensure the transcriber.py file is saved and that your virtual environment is still activated. Navigate to the project directory in a terminal and run the transcriber.py file with the following command:
python transcriber.py
Once the script has finished executing, you will either see the transcription printed to your Console or put into a file (by default, podcast.txt), with the following contents:
Have you been considering launching a product or even a business based on Python's AI ML Stack? We have a great guest on this episode, Dylan Fox, who is the founder of AssemblyAI and has been building his startup successfully over the past few years. He has interesting stories of hundreds of GPUs in the cloud, evolving ML models, and much more that I know you can enjoy hearing....
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.






