Tutorials

Python Speech-to-Text with Punctuation, Casing, and Formatting

Learn how to transcribe audio and video files into text that contains punctuation, casing and formatting using the AssemblyAI Python SDK.

Python Speech-to-Text with Punctuation, Casing, and Formatting

Automatically-generated transcripts from audio and video files are often more useful when punctuation, casing, and formatting are added to the transcription result. This tutorial shows how to specify parameters in the AssemblyAI Python SDK to obtain transcripts that include punctuation, casing, and formatting in them.

Project Prerequisites

We will use the following dependencies to complete this tutorial:

All code in this blog post is available open source under the MIT license on GitHub under the transcribe-english-punctuation directory of the assemblyai-examples Git repository. You may therefore use the source code as you desire for your own projects.

Python Development Environment Configuration

Change into the directory where you keep your Python virtual environments. Create a new virtualenv for this project using the following command:

python -m venv transcriber

Activate the virtual environment with the activation script on macOS or Linux:

source ./transcriber/bin/activate

On Windows, the virtual environment can be activated with the activate.bat script:

.\transcriber\Scripts\activate.bat

Install the AssemblyAI Python SDK package:

pip install -U "assemblyai>=0.3.2"

Log into your AssemblyAI account, or sign up for a new account if you don't already have one. Find and click the "Copy token" button on the dashboard, as shown below:

Switch back to your console and paste the API token into the string value of the environment variable for ASSEMBLYAI_API_KEY. If you are on Windows, you can set an environment variable through the Windows settings. If you are on MacOS/Linux, run this command to set the ASSEMBLYAI_API_KEY environment variable:

export ASSEMBLYAI_API_KEY='' # paste this value from assemblyai.com/dashboard

Transcription Python Code

Our dependencies are installed and the environment variable is set, so next we will write the Python code to handle the transcription.

Our transcription Python script will be compact given that we can use AssemblyAI Python SDK, which abstracts calling the AssemblyAI API and reduces the amount of required boilerplate code. Create a new file named transcriber.py and copy in the following code along with the comments for each line.

import assemblyai as aai


# the URL of the file we want to transcribe
URL = "https://talkpython.fm/episodes/download/356/tips-for-ml-ai-startups.mp3"
                                                                                                                                                       	 
# the file to write the results to, or leave empty to print to console
OUTPUT_FILENAME = "podcast.txt"

# configuration settings when you want punctuation, formatting, etc
config = aai.TranscriptionConfig(
    punctuate = True,
    format_text = True
)

# instantiate the AssemblyAI Python SDK, will use ASSEMBLYAI_API_KEY
# environment variable if it is set, otherwise can be passed in with
# aai.settings.api_key = f"{ASSEMBLYAI_API_KEY}"
transcriber = aai.Transcriber()

# call the API, will block until the API call finishes
transcript = transcriber.transcribe(URL, config)

# write to a file or print to console if no OUTPUT_FILENAME is set
if OUTPUT_FILENAME:
	with open(OUTPUT_FILENAME, "w") as file:
        file.write(transcript.text)
else:
    print(transcript.text)

The above code imports the assemblyai Python package into the script, using aai as a shorthand reference. Then the script sets two variables: URL, and OUTPUT_FILENAME. URL is the publicly-accessible URL of the file to transcribe. In this case, we're transcribing the Talk Python to Me episode where AssemblyAI Founder/CEO Dylan Fox gives tips for machine learning and AI startups. OUTPUT_FILENAME is the filepath corresponding to where the transcript should be saved locally. If OUTPUT_FILENAME is not set then the output will be printed to the console.

The fourth line of code sets the config variable with two important settings. The first setting is punctuate, which enables punctuation for the returned transcription, and the second is format_text which adds casing and formatting to the transcription.

The fifth line of code instantiates the Transcriber object, which is the main class for calling AssemblyAI's transcription service and also stores your API key to authenticate you during API calls. Note that the Transcriber automatically looks for a value in an environment variable named ASSEMBLYAI_API_KEY and will use that as the API key if one is set. This can be overridden by explicitly setting the API key with aai.settings.api_key = f"{ASSEMBLYAI_API_KEY}" instead.

In the sixth line of code, we use the URL and the config variable to execute the transcription API call with our desired settings. That line will block program execution until the transcription is finished, but you can also set a webhook to obtain the result when it is ready. Finally, the remainder of the script outputs the transcription result to a file or prints it to the console.

Running the Transcription Code

Ensure the transcriber.py file is saved and that your virtual environment is still activated. Navigate to the project directory in a terminal and run the transcriber.py file with the following command:

python transcriber.py

Once the script has finished executing, you will either see the transcription printed to your Console or put into a file (by default, podcast.txt), with the following contents:

Have you been considering launching a product or even a business based on Python's AI ML Stack? We have a great guest on this episode, Dylan Fox, who is the founder of AssemblyAI and has been building his startup successfully over the past few years. He has interesting stories of hundreds of GPUs in the cloud, evolving ML models, and much more that I know you can enjoy hearing....

What's Next?

We just used the AssemblyAI Python SDK to transcribe an hour-long podcast and output the result with punctuation, casing and formatting.

Here are a few other handy resources to learn more about what you can do with the transcript: