Transcribing an audio file
In this guide, we'll show you how to use the API to transcribe your audio files.
If you're using Python or TypeScript, we recommend that you instead check out our new tutorial on how to Transcribe an audio file.
Create a new file and import the necessary libraries for making an HTTP request.
Set up the API endpoint and headers. The headers should include your API token.
Upload your local file to the AssemblyAI API.
upload_urlreturned by the AssemblyAI API to create a JSON payload containing the
We delete uploaded files from our servers either after the transcription has completed, or 24 hours after you uploaded the file. After the file has been deleted, the corresponding
upload_urlis no longer valid.
POSTrequest to the AssemblyAI API endpoint with the payload and headers.
After making the request, you'll receive an ID for the transcription. Use it to poll the API every few seconds to check the status of the transcript job. Once the status is
completed, you can retrieve the transcript from the API response.
Understanding the response
The AssemblyAI API returns JSON-formatted output. Your transcription will be located in the
text key. You'll also find a timestamp and a confidence score for each word inside of the
words key, as well as other parameters assigned by the API such as
Refer to the API reference for a breakdown of every element in your transcript output.
When using the AssemblyAI API to transcribe audio files, we recommended using the polling technique to check for the status of the transcription. This means making a request every few seconds to check if the transcription is complete, as described above.
Alternatively, you can also set up webhooks to receive notifications when the transcription is complete. This can help reduce the overhead of polling and make your application more efficient.
Transcription is our core API use case, and nearly all other AssemblyAI features leverage our transcription functionality. We're constantly improving and updating the language models used by our transcription engine. Of course, higher quality audio generally produces better results.