Migration guide: OpenAI to AssemblyAI

This guide walks through the process of migrating from OpenAI to AssemblyAI for transcribing pre-recorded audio.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

Side-By-Side Code Comparison

Below is a side-by-side comparison of a basic snippet to transcribe a local file by OpenAI and AssemblyAI:

1from openai import OpenAI
2
3api_key = "YOUR_OPENAI_API_KEY"
4client = OpenAI(api_key)
5
6audio_file = open("./example.wav", "rb")
7
8transcript = client.audio.transcriptions.create(
9 model = "whisper-1",
10 file = audio_file
11)
12
13print(transcript.text)

Here are helpful things to know about our transcribe method:

  • The SDK handles polling under the hood
  • Transcript is directly accessible via transcript.text
  • English is the default language and Best is the default speech model if none is specified
  • We have a cookbook for error handling common errors when using our API.

Installation

1from openai import OpenAI
2
3api_key = "YOUR_OPENAI_API_KEY"
4client = OpenAI(api_key)

When migrating from OpenAI to AssemblyAI, you’ll first need to handle authentication and SDK setup:

Get your API key from your AssemblyAI dashboard
To follow this guide, install AssemblyAI’s Python SDK by typing this code into your terminal:
pip install assemblyai

Things to know:

  • Store your API key securely in an environment variable
  • API key authentication works the same across all AssemblyAI SDKs

Audio File Sources

1client = OpenAI()
2
3# Local Files
4
5audio_file = open("./example.wav", "rb")
6transcript = client.audio.transcriptions.create(
7 model = "whisper-1",
8 file = audio_file
9)

Here are helpful things to know when migrating your audio input handling:

  • AssemblyAI natively supports transcribing publicly accessible audio URLs (for example, S3 URLs), the Whisper API only natively supports transcribing local files.
  • There’s no need to specify the audio format to AssemblyAI - it’s auto-detected. AssemblyAI accepts almost every audio/video file type: here is a full list of all our supported file types
  • The Whisper API only supports file sizes up to 25MB, AssemblyAI supports file sizes up to 5GB.

Adding Features

1transcript = client.audio.transcriptions.create(
2 file = audio_file,
3 prompt = "INSERT_PROMPT" # Optional text to guide the model's style
4 language = "en" # Set language code
5 model = "whisper-1",
6 response_format = "verbose_json",
7 timestamp_granularities = ["word"]
8)
9
10# Access word-level timestamps
11
12print(transcript.words)

Key differences:

  • OpenAI does not offer audio intelligence features for their speech-to-text API
  • Use aai.TranscriptionConfig to specify any extra features that you wish to use
  • With AssemblyAI, timestamp granularity is word-level by default
  • The results for Speaker Diarization are stored in transcript.utterances. To see the full transcript response object, refer to our API Reference.
  • Check our documentation for our full list of available features and their parameters
  • If you want to send a custom prompt to the LLM, you can use LeMUR Task and apply the model to your transcribed audio files.