Table of contents
The end of 2021 is almost upon us - and it’s been a good one here at AssemblyAI! We’ve worked hard and delivered some of our best updates and new features yet, including:
- Our most accurate transcription model (v8) backed by our research into Transformer Transducers
- Over 9 major new features like Entity Detection and Auto Chapters
- Quadrupling our fully remote team - we're still hiring!
- Our new YouTube channel with weekly Machine Learning tutorials
- Sponsored dozens of Hackathons like HackDuke and Hack the Valley
- Published 81 blog posts like our popular tutorials on PyTorch Lightning, DeepSpeech, Python Speech Recognition, and TensorFlow vs PyTorch in 2022
- A brand new developer dashboard
- An updated public changelog
And much more! Read on for a summary of all that’s happened in 2021 at AssemblyAI.
v8 Transcription Model with 20% Higher Accuracy
In October, we released our most accurate Speech Recognition model to date. v8 delivered up to 18.72% better accuracy across all types of audio and video data to our customers. Proper noun accuracy also increased by an amazing 24.47%.Read more about v8 here.
Our 2021 Accuracy Benchmark Report showcases these accuracy gains by comparing our transcription accuracy to Google Cloud Speech-to-Text and AWS Transcribe, as well as providing demonstrations of our features in action.Check out the full Benchmark Report here.
We released 9 major new features in 2021, as well as countless updates to others.
1. Real-time Transcription
If you're working with live audio, we can stream your transcripts to you within a few hundred milliseconds, and additionally, revise these transcripts with more accuracy over time as more context arrives. Learn more about our real-time transcription API here.
2. Entity Detection
Our Entity Detection feature automatically detects a wide range of entities found in your transcription text such as names, addresses, phone numbers, social security numbers, locations, and more. Learn more about our Entity Detection feature here.
3. Auto Chapters (Summarization)
Our Auto Chapters feature provides a "summary over time" for audio content by first breaking audio/video files into logical "chapters" as the topic of conversation changes, and then providing an automatically generated summary for each "chapter" of content. Read more about Auto Chapters here.
4. Sentiment Analysis
Our Sentiment Analysis feature detects the sentiment of each sentence spoken in your audio files as either “positive,” “negative,” or “neutral”. Learn more about Sentiment Analysis here.
5. Filler Words
Filler-words like "um" and "uh" can now be included in your transcription text with high accuracy. Read more about how to use our Disfluencies feature here.
6. Severity Scores for Content Safety
Severity Scores works with our Content Safety feature to measure how intense a detected Content Safety label is on a scale of 0 to 1. Read more about both Content Safety and Severity Scores here.
7. Word Search
Search completed transcripts for a set of specific keywords. Learn how to use the Word Search feature here.
8. Paragraph Detection
Break your transcription into automatically generated paragraphs for easier reading and comprehension. Learn how to use Paragraph Detection here.
9. Usage Alerts
Our Usage Alert feature now lets customers set a monthly usage threshold on their account, along with a list of email addresses to be notified when that monthly threshold has been exceeded. This feature can be enabled by clicking “Set up alerts” on the “Developers” tab in the Dashboard.
Make sure you subscribe to our weekly updates via our Changelog to keep up-to-date, including soon to be released features like Emotion Detection and Translation.
2021 also saw additional product updates, such as a much improved public changelog:
We also launched a new developer dashboard that offers real-time usage and spend data for developers, as well as a web interface to quickly transcribe a video or audio file from your browser, so that you can quickly try AssemblyAI’s models without having to write any code.
Our social media accounts grew too! We launched a new YouTube channel featuring original Deep Learning content and tutorials. It quickly grew to over 300 subscribers in under 30 days!
We kicked off the year publishing an overview of End-to-End Architectures for Speech Recognition (2021). In this review, we published the major differences between popular, modern architectures for Automatic Speech Recognition - including LAS, CTC, and RNN-T.
Later in the year, we published a deep dive into how Transducer models (2021) can be used for Automatic Speech Recognition with high accuracy, and went into more detail about how they compare to the more common CTC model architecture.
As a startup building large scale Transformer models, we shared some of the lessons we’ve learned and tips for other startups in the AI space. We also started our popular Weekly Deep Learning Paper Review series - where our research team provides commentary on exciting new research that’s coming out from the broader AI research community. For example, we looked at:
- VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
- SimCLS and RefSum - Summarization Techniques
- Speech processing Universal Performance Benchmark Review
- Pretraining Representations for Data-Efficient Reinforcement Learning
- ...and many others!
AssemblyAI was recognized as both a Fall and Winter 2021 High Performer and Momentum Leader on G2!
The High Performer award recognizes products with high customer satisfaction scores. The Momentum Leader award recognizes products in the top 25% of their category’s products. AssemblyAI rates an average of 4.8 out of 5 stars on the G2 platform.
Top Blog Posts
We produced 81 pieces of blog content in 2021!
Here were the top 10 blog posts:
- How to Train Large Deep Learning Models as a Startup
- PyTorch vs TensorFlow in 2022
- Real Time Speech Recognition with Python
- The State of Python Speech Recognition in 2021
- DeepSpeech for Dummies - A Tutorial and Overview
- PyTorch Lightning for Dummies - A Tutorial and Overview
- Getting started with HttpClientFactory in C# and .NET 5
- Python Speech Recognition in Under 25 Lines of Code
- Fine-Tuning Transformers for NLP
- Building an End-to-End Speech Recognition Model in PyTorch
A Year Wrapped
We can’t wait to hit the ground running in 2022! Thank you for coming along with us on this exciting journey!