Migration guide: OpenAI to AssemblyAI
This guide walks through the process of migrating from OpenAI to AssemblyAI for transcribing pre-recorded audio.
Get Started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.
Side-By-Side Code Comparison
Below is a side-by-side comparison of a basic snippet to transcribe a local file by OpenAI and AssemblyAI:
OpenAI
AssemblyAI
Here are helpful things to know about our transcribe
method:
- The SDK handles polling under the hood
- Transcript is directly accessible via
transcript.text
- English is the default language and Best is the default speech model if none is specified
- We have a cookbook for error handling common errors when using our API.
Installation
OpenAI
AssemblyAI
When migrating from OpenAI to AssemblyAI, you’ll first need to handle authentication and SDK setup:
Get your API key from your AssemblyAI dashboard
To follow this guide, install AssemblyAI’s Python SDK by typing this code into your terminal:
pip install assemblyai
Things to know:
- Store your API key securely in an environment variable
- API key authentication works the same across all AssemblyAI SDKs
Audio File Sources
OpenAI
AssemblyAI
Here are helpful things to know when migrating your audio input handling:
- AssemblyAI natively supports transcribing publicly accessible audio URLs (for example, S3 URLs), the Whisper API only natively supports transcribing local files.
- There’s no need to specify the audio format to AssemblyAI - it’s auto-detected. AssemblyAI accepts almost every audio/video file type: here is a full list of all our supported file types
- The Whisper API only supports file sizes up to 25MB, AssemblyAI supports file sizes up to 5GB.
Adding Features
OpenAI
AssemblyAI
Key differences:
- OpenAI does not offer audio intelligence features for their speech-to-text API
- Use
aai.TranscriptionConfig
to specify any extra features that you wish to use - With AssemblyAI, timestamp granularity is word-level by default
- The results for Speaker Diarization are stored in
transcript.utterances
. To see the full transcript response object, refer to our API Reference. - Check our documentation for our full list of available features and their parameters
- If you want to send a custom prompt to the LLM, you can use LeMUR Task and apply the model to your transcribed audio files.