Migration guide: Speechmatics to AssemblyAI
This guide walks through the process of migrating from Speechmatics to AssemblyAI for streaming Speech-to-text.
Get started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.
Side-by-side code comparison
Below is a side-by-side comparison of a basic Python code snippet to transcribe streaming audio by Speechmatics and AssemblyAI.
Speechmatics
AssemblyAI
Step 1: Install dependencies
Step 2: Configure the API key
In this step, you’ll configure your API key to authenticate your requests.
Speechmatics
AssemblyAI
Navigate to API Keys in your account settings and copy your API key.
Speechmatics
AssemblyAI
Store your API key in a variable. Replace <YOUR_API_KEY>
with your copied API key.
Authenticate With A Temporary Token
Speechmatics
AssemblyAI
Token usage
Instead of authorizing your request with YOUR_API_KEY
(via request header), use the temporary token generated by this function when establishing the WebSocket connection.
Step 3: Set up audio configuration
Configure the audio settings for your microphone stream.
Speechmatics
AssemblyAI
Sample rate
Speechmatics recommends using a 16 kHz
sample rate for speech audio. Anything higher will be downsampled server-side.
Audio data format
If you want to stream data from elsewhere, make sure that your audio data is in the following format:
- Single-channel
- PCM16 (default) or Mu-law encoding (see Specifying the encoding)
- A sample rate that matches the value of the
sample_rate
parameter (16 kHz is recommended) - 50 milliseconds of audio per message (larger chunk sizes are workable, but may result in latency fluctuations)
Step 4: Create event handlers
In this step, you’ll set up callback functions that handle the different events.
Create functions to handle the events from the real-time service.
Speechmatics
AssemblyAI
Connection configuration
Speechmatics requires a handshake where the connection configuration is specified before audio is streamed. AssemblyAI allows you to configure the connection via query parameters in the URL and start streaming audio immediately.
The Speechmatics handshake begins when on_open
sends a StartRecognition
message to configure the session. Audio streaming only starts after the RecognitionStarted
message type is parsed and confirmed in the on_message
callback.
Create another function to handle transcripts.
Speechmatics has separate partial (AddPartialTranscript
) and final (AddTranscript
) transcripts. The terminate session message is EndOfTranscript
.
AssemblyAI instead uses a Turn
object with a turn_is_formatted
boolean flag to indicate finality. The terminate session message is Termination
.
For more on the Turn object, see Streaming Core concepts section.
Speechmatics
AssemblyAI
Transcript message structure
Please note the difference in transcript message structure below:
Step 5: Connect and start transcription
To stream audio, establish a connection to the API via WebSockets.
Speechmatics
AssemblyAI
Create a WebSocket connection to the Realtime service.
Authorization
Note that while both services use an Authorization
header to authenticate the WebSocket connection, Speechmatics uses a Bearer
prefix, while AssemblyAI does not.
Step 6: Close the connection
Keep the main thread alive until interrupted, handle keyboard interrupts and thrown exceptions, and clean up upon closing of the WebSocket connection.
Speechmatics
AssemblyAI
The connection will close automatically when you press Ctrl+C
. In both cases, the on_close
handler will clean up the audio resources.
Step 7: Execute the main function
Finally, run the main function to start the main execution.
Speechmatics
AssemblyAI
Next steps
To learn more about both Streaming APIs, their key differences, and how to best migrate, see the following resources:
AssemblyAI
Speechmatics
Need some help?
If you get stuck or have any other questions, contact our support team at support@assemblyai.com
or create a support ticket.