Table of contents
Choosing the best Speech-to-Text API for your product can be challenging. You’ll need to compare accuracy, features, support options, documentation, security, and more.
But what if you have a small project to complete? Or simply want to play around with an API or test an API before committing to building with one?
This post compares the best free Speech-to-Text APIs on the market today, including APIs that have a free tier. We’ll also look at several free open source Speech-to-Text engines and explore why you might choose an API versus an open source library, or vice versa.
Free Speech-to-Text APIs
APIs are more accurate, easier to integrate, and come with more out-of-the-box features than open source options. However, large scale use of APIs typically comes with a cost.
But if you’re looking to use an API for a small project or for a trial run, many of today’s Speech-to-Text APIs have a free tier. This means that the API is free for anyone to use up to a certain volume per month or per year.
Let’s look at three of the most popular Speech-to-Text APIs with a free tier: Google, AssemblyAI, and AWS Transcribe.
Google Speech-to-Text is a well known speech transcription API. Google gives users 60 minutes free transcription, with $300 in free credits for Google Cloud hosting.
However, since Google only supports transcribing files already in a Google Cloud Bucket, the free credits won’t get you very far. Google can also be a bit difficult to get started with since you need to sign up for a GCP account and project, even to use the free tier, which is surprisingly complicated.
Still, with good accuracy and 63+ languages supported, Google is a good choice if you’re willing to put in some initial work.
AssemblyAI is a newer name in the Speech-to-Text API market that is growing quickly thanks to industry-best accuracy, an easy-to-use interface, and features such as Speaker Diarization, Topic Detection, Automated Punctuation and Casing, Paragraph Detection, and more.
The company offers three free transcription hours for audio files or video streams per month before transitioning to an affordable paid tier.
Its high accuracy and extensive feature list, like Speaker Diarization and Sentiment Analysis, makes AssemblyAI a sound option for developers looking for a free Speech-to-Text API. The API also supports virtually every audio and video file format out-of-the-box for easier transcription.
As of today, AssemblyAI only supports English transcription, limiting its use to English speaking countries, and has limited SDKs available. However, its easy to use API allows for quick set-up and transcription in any programming language. You can even copy/paste code examples in your preferred language directly from the AssemblyAI API Docs.Test AssemblyAI for Free
AWS Transcribe offers one hour free per month for the first 12 months of use.
Like Google, you must create an AWS account first if you don’t already have one, which is a complex process. AWS also has lower accuracy compared to alternative APIs and only supports transcribing files already in an Amazon S3 bucket.
However, if you’re looking for a specific feature, like medical transcription, AWS has some intriguing options. Its Transcribe Medical API is a medical-focused ASR option that is available today.
Open Source Speech-to-Text Transcription Engines
An alternative to APIs, open source Speech-to-Text libraries are completely free--with no limits on use. Some developers also see data security as a plus, since your data doesn’t have to be sent to a third party or to the cloud.
Be warned--there is a high lift involved with open source engines, so you must be comfortable putting in a lot of work to get the results you want, especially if you are trying to use these libraries at scale. Open source Speech-to-Text engines are also much less accurate than the APIs discussed above.
If you want to go the open source route, however, here are some options worth exploring:
DeepSpeech is an open source embedded Speech-to-Text engine designed to run in real-time on a range of devices, from high powered GPUs to a Raspberry Pi 4. The DeepSpeech library uses end-to-end model architecture pioneered by Baidu.
DeepSpeech also has decent out-of-the-box accuracy for an open source option, and is easy to fine tune and train on your own data.
Kaldi is a speech recognition toolkit that has been widely popular in the research community for many years.
Like DeepSpeech, Kaldi has good out-of-the-box accuracy and supports the ability to train your own models. It’s also been thoroughly tested--a lot of companies currently use Kaldi in production and have used it for a while--making more developers confident in its application.
Wav2Letter is Facebook AI Research’s Automatic Speech Recognition (ASR) Toolkit, also written in C++, and using the ArrayFire tensor library.
Like DeepSpeech, Wav2Letter is decently accurate for an open source library and is easy to work with on a small project.
Overall, the platform is well-defined and constantly updated, making it a straightforward tool for training and finetuning.
Coqui is our final deep learning toolkit for Speech-to-Text transcription. Coqui is used in over twenty languages for projects and also offers a variety of essential inference and productionization features.
The platform also releases custom trained models and has bindings for various programming languages for easier deployment.
Which Free Speech-to-Text API or Open Source Engine is Right for Your Project?
The best free Speech-to-Text API or open source engine will depend on our project. Do you have a small project and want something that is easy-to-use, has high accuracy, and additional out-of-the-box features? If so, one of these APIs might be right for you:
Alternatively, you might want a completely free option with no data limits if you don’t mind the extra work it will take to tailor a toolkit to your needs. If so, you might choose one of these open source libraries:
Whichever you choose, make sure you find a product that can continually meet the needs of your project now and what your project may develop into in the future.