Seriously. Don't believe me? Below is the audio recording of my sweet, sultry voice creating that paragraph. I spoke it off the cuff, and it only took me 2 takes to not stumble over my own words that I imagined 😅.
I would have spoken this entire post by voice, but because of my limited human memory, I would have to script it in writing, and that defeats the proof of concept, now doesn't it? Here's how you can transcribe your audio files with Python too.
Sign up for an AssemblyAI API Token
You can sign up for a free AssemblyAI account in seconds by just entering your email address. After you verify your account from your email address, you're taken right back to your new account where you can see your API token in your dashboard.
Find an audio file to try it on
AssemblyAI transcription supports loads of file types, and gives you 5 hours of audio transcription per month absolutely free, with no billing setup or commitment. So find that perfect mic check audio file you want to try it out on, you'll have plenty of chances to test it with our API.
Upload your file for transcription
If you already have a publicly accessible URL for your audio file already hosted somewhere, you can skip this step... but if not, the first step is to upload your audio file to AssemblyAI with this code:
A successful upload will yield a JSON response similar this:
Submit the audio file for transcription
The code below uses the upload_url key value from the JSON response in the previous code, so it will run sequentially with that. If you already have a url for your audio file, replace it with response.json()['upload_url'].
A successful response will look like so, displaying the "status" : "queued"
Get that transcription you've been longing for!!
This code below will check the status of the transcription, and concatenates the transcription id resp.json()['id'] into the url for you, so it will once again run sequentially with the previous code.
You'll see in the above code we are printing 3 things. The first print statement print(status_check.json()['status']) prints out the status of the transcription, which goes from "queued" to "processing" to "completed". Audio files take around 25% of their audio duration to complete, so a 10 minute audio file would complete within 2.5 minutes. You'll want to run this above code in a loop until the status key shows "completed".
Once the status is "completed", the second print statement print(status_check.json()['text']) will print out the actual transcription text. And the third print statement prints out the entire API response, with a bunch more meta data like the timings for when each word was spoken, the confidence for each word, and a bunch of other data! You can see a complete example of the API's response JSON on the AssemblyAI Docs here.
My mom did manual transcription work in the 90's, and she hated it. REJOICE at this beautiful API that AssemblyAI has created for you, and all the headaches it can save you.
The full sequential code can be found in this colab notebook here! Happy transcribing!