Welcome to the AssemblyAI Blog

Learn about our latest research and product updates
How to Convert an MP3 File to Text with an API
Tutorials
How to Convert an MP3 File to Text with an API

Recent advantages in deep learning technology has made speech recognition available to developers at large. Here we go over how to convert an mp3 file to text with AssemblyAI's speech to text API.

Is Word Error Rate a Good Measure of Speech Recognition Systems?
Deep Learning
Is Word Error Rate a Good Measure of Speech Recognition Systems?

What is Word Error Rate? Word Error Rate is a measure of how accurate an Automatic Speech Recognition (ASR) system performs. Quite literally, it calculates how many “errors” are in the transcription text produced by an ASR system, compared to a human transcription.

The State of Python Speech Recognition in 2021
Tutorials
The State of Python Speech Recognition in 2021

We cover the state of the art Python Speech Recognition technologies. There are three open source libraries covered, wav2letter, SpeechRecognition, and DeepSpeech. Then we cover AssemblyAI's speech to text API as a super simple cloud solution that also offers custom vocabulary, speaker diarization, and paragraph extraction.

8/31/2021 AWS Outage Post-Mortem
Engineering
8/31/2021 AWS Outage Post-Mortem

On Tuesday, August 31st, AWS had an outage in their us-west-2 region. At 18:00 UTC that day, we experienced an increase in 5xx error codes returned by our API, as well as a slowdown in transcription turnaround time. The AWS outage impacted a single AWS availability zone, usw2-az2. We would like to take this opportunity to share our post mortem of this event.

Can Podcasts Predict the Stock Market with AssemblyAI, The Daily, and Up First
Tutorials
Case Studies
Can Podcasts Predict the Stock Market?

We explore if the frequency of negative news in the popular podcasts The Daily and Up First can predict the stock market. We use Python Selenium with Chromedriver to crawl Listen Notes and send the links to be transcribed with AssemblyAI's speech to text API.

How to Add Subtitles to Your Mux Videos with Python
Tutorials
How to Add Subtitles to Your Mux Videos with Python

How to use AssemblyAI's mp4 to text API to generate a subtitle file to add to your Mux video programmatically in Python. This speech to text example shows how to get the .srt and .vtt subtitle files from AssemblyAI, upload them to S3, and then use Mux's Python SDK to upload a video with subtitles.

Real Time Speech Recognition with Python
Tutorials
Real Time Speech Recognition with Python

This Python speech recognition tutorial shows how to do real time speech recognition with Python using AssemblyAI's real time transcription websocket and Python PyAudio for streaming sound from the microphone.

How to Programmatically Set Up Twilio Voicemail
Tutorials
How to Programmatically Set Up Twilio Voicemail

Learn how to use Python to set up Twilio Voicemail. This tutorial shows how to use a set of Python scripts to programmatically access Twilio's API and set up a voicemail system that can record your calls and which you can download .mp3 recordings from.

How to Build a Burner Phone with Voicemail in Python
Tutorials
How to Build a Burner Phone with Voicemail in Python

What is a burner phone number? It's one you can drop for things you don't want to give out your real phone number to and can get rid of easily. This tutorial shows you how to build a Python burner phone with Twilio and AssemblyAI that will transcribe your voicemails.

The Definitive Guide to Python Click
Tutorials
The Definitive Guide to Python Click

A tutorial on how to use the components of the Python Click library to intuitively and easily build simple to complex command line interface (CLI) applications. This tutorial covers styling, passing context, creating your own pass decorators, nested commands, and how to use multiple command groups.

Python Speech Recognition in Under 25 Lines of Code
Tutorials
Python Speech Recognition in Under 25 Lines of Code

An easy to follow Python tutorial on how to do speech recognition in under 25 lines of code. We extend our short example using AssemblyAI's API to a longer script and build a Python project that can automatically do speech recognition and transcribe an mp3 into a txt file.

Improved Real-Time Transcription Speed and Accuracy
Changelog
Improved Real-Time Transcription Speed and Accuracy

This week, we're excited to launch an update to our Real-Time WebSocket Transcription API! This update improves both accuracy and latency for results streamed back from the API.

How to build a YouTube Downloader in Python
Tutorials
How to build a YouTube downloader in Python

A tutorial on how to make a YouTube Downloader. We'll show you how to make your own command line interface to save YouTube videos and audios using youtube_dl and ffmpeg in Python.

How to get the transcript of a YouTube video
Tutorials
How to get the transcript of a YouTube video

Learn how to get the transcript of a YouTube video with this simple Python tutorial. We'll use youtube_dl and ffmpeg to download YouTube video audios, and automatically transcribe them with AssemblyAI!

Fine-Tuning Transformers for NLP
Deep Learning
Tutorials
Fine-Tuning Transformers for NLP

Since being first developed and released in the Attention Is All You Need paper Transformers have completely redefined the field of Natural Language Processing. In this blog, we show you how to quickly fine-tune Transformers for numerous downstream tasks, that often perform really well out of the box.

Transcribing Zoom Recordings Using the Zoom API and AssemblyAI
Tutorials
Transcribing Zoom Recordings Using the Zoom API and AssemblyAI

In this post, we’re going to show you how to transcribe your Zoom recordings by connecting Zoom’s API with AssemblyAI’s automatic speech recognition API. In just a few lines of code, you'll see how you can accurately transcribe your Zoom recordings!

Open Sourcing Drone Deploy ECS
Engineering
Open Sourcing Drone Deploy ECS

We’re excited to announce that we’ve open sourced another project! `drone-deploy-ecs` is a Drone plugin that enables you to deploy updates to ECS. Our engineering team has recently made the decision to migrate from Docker on EC2 to AWS ECS. We knew that moving to ECS would require us to refactor our deployment processes, so we figured we’d wrap our deployment process into a single tool that fit into our CICD solution.

A New API Endpoint to Paginate Through Historical Transcripts
Changelog
A New API Endpoint to Paginate Through Historical Transcripts

Today we are excited to make available our new List Endpoint, which give developers the ability to query for, and paginate through, all of their historical transcriptions.

Redacting Sensitive Medical Information from Transcriptions
Changelog
Redacting Sensitive Medical Information from Transcriptions

AssemblyAI's Speech-to-Text API now supports automatically detecting and redacting medical information, like drug names, injuries, medical conditions, and medical procedures, from transcription text!

How to Transcribe an Audio File in C# using the AssemblyAI API
Tutorials
How to Transcribe an Audio File in C# using the AssemblyAI API

Over the course of this post, we'll be walking you through the process of uploading an audio file from your local machine to AssemblyAI, and submitting that audio file for transcription using C#.

How to transcribe an audio file with Python and AssemblyAI
Tutorials
How to Transcribe an Audio File with Python and AssemblyAI

In this tutorial, we show you how to easily transcribe an audio or video file using Python and the AssemblyAI API in just a few lines of code!

2 New Endpoints to Return Transcripts as Paragraphs and Sentences
Changelog
2 New Endpoints to Return Transcripts as Paragraphs and Sentences

We now have two new endpoints that allow you to pull a completed transcript broken into paragraphs, or should you desire to be more specific, sentences!

Open Sourcing our Drone CI/CD CloudWatch Auto Scaler
Engineering
Open Sourcing our Drone CI/CD CloudWatch Auto Scaler

At AssemblyAI, we use Drone as our primary CI/CD tool. It's dead simple to set up and operate which frees us up to build out our product. Recently we decided to figure out how to build a cost-effective, easily-scalable Drone worker fleet for our GPU instances.

Getting started with HttpClientFactory in C# and .NET 5
Engineering
Getting started with HttpClientFactory in C# and .NET 5

HttpClientFactory has been around the .NET ecosystem for a few years now. In this post we will look at 3 basic implementations of HttpClientFactory; basic, named, and typed.

Feature Announcement: Content Safety Detection
Announcements
Changelog
Content Safety Detection is now GA!

Automatically transcribe audio and video files, and surface sensitive content, such "Hate Speech" or "NSFW" content, found within the audio.

Changelog: New Speaker Diarization model released
Changelog
Improved Speaker Diarization Accuracy Released

We have released a new Diarization model. Speaker diarization is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity.

Speedy, code-free speech-to-text with AssemblyAI and Postman
Tutorials
Speedy, code-free speech-to-text with AssemblyAI and Postman

This article takes the reader through the process of setting up Postman, an API testing tool, to upload an audio file, then kick off a speech-to-text transcription and then download the completed transcription.

Uploading files to AssemblyAI using Node.js and JavaScript
Tutorials
Using AssemblyAI and Node.js to Transcribe Local Audio Files

In this blog, we look at how to upload a file from the local disk to AssemblyAI ready for transcription using Node.js and JavaScript

Getting started with speech-to-text transcriptions with AssemblyAI, JavaScript and Node.js
Tutorials
Getting started with speech-to-text transcriptions with AssemblyAI, JavaScript and Node.js

Build a command-line speech-to-text app with Node.js, JavaScript and AssemblyAI

New Punctuation and casing model
Changelog
New Punctuation and Casing model released 🎉

We’ve been hard at work at AssemblyAI dramatically improving our speech-to-text features and this week sees an update to our punctuation and casing features with a brand new model.

A Comparison of End-to-End Speech Recognition Architectures in 2021
Deep Learning
A Survey on End-To-End Speech Recognition Architectures in 2021

As part of our core research and development efforts to continue pushing the state of the art of speech recognition accuracy, in this post, we explore speech recognition architectures that are gaining new popularity in both academia and industry settings.

Building an end-to-end Speech Recognition model in PyTorch
Deep Learning
Building an end-to-end Speech Recognition model in PyTorch

The complete guide on how to build an end-to-end Speech Recognition model in PyTorch. Train your own CTC Deep Speech model using this tutorial.

PII Redaction and Speech-to-Text Accuracy Improvements
Changelog
PII Redaction and Accuracy Improvements

Last month, we introduced PII Redaction Policies, as part of a big overhaul to our PII Redaction feature to make it more flexible and powerful for you to specify exactly what you want redacted from your transcript

WhatConverts Call Tracking | AssemblyAI Speech-to-Text API Case Study
Case Studies
WhatConverts | AssemblyAI Case Study

WhatConverts call tracking platform partners with AssemblyAI Speech-to-Text API to power State-of-the-Art transcription accuracy.

AssemblyAI Awarded Best Speech-to-Text API API of 2020
Announcements
AssemblyAI Wins Best Public API 2020 🏆

Nordic APIs announces AssemblyAI Speech-to-Text API wins 2020 API of the year. Offering a high accuracy speech-to-text API built for developers.

PII Redaction for Speech-to-Text
Announcements
PII Redaction Policies for Speech-to-Text

Advanced PII Redaction from AssemblyAI Speech-to-Text API with customizable redaction rules and over 15 types of PII detected and redacted.

 AssemblyAI Speech-to-Text API - The State of Speaker Diarization
Deep Learning
Speaker Diarization: Speaker Labels for Mono Channel Files

AssemblyAI Speech-to-Text API's speaker diarization (diarisation) is the process of splitting audio or video inputs automatically based on the speaker's identity. It helps you answer the question "who spoke when?".

Conversation Intelligence with CallRail and AssemblyAI
Announcements
[Webinar] Conversation Intelligence with CallRail

CallRail and AssemblyAI Speech-to-Text API discuss how conversation intelligence transforms call data into insights that boost sales performance and conversions.

Voice Search Partnership with Algolia
Announcements
Voice Search Partnership with Algolia

AssemblyAI Speech-to-Text API update includes the launch of Voice Search with Algolia, improved accuracy, and enhanced speed.

AssemblyAI Speech-to-Text API Improved Accuracy WER (Word Error Rate)
Announcements
Improved WER - Speech-to-Text Accuracy vs. Google, AWS (May 2020 Update)

AssemblyAI launches significant improvements to transcription accuracy, consistently out-benchmarking other providers like Google Cloud Speech-to-Text, AWS Transcribe, and Rev.ai.

Algolia Voice Search with AssemblyAI Speech-to-Text API
Announcements
Teaming up with Algolia for effortless Voice Search

AssemblyAI and Algolia launch a plugin for their popular InstantSearch.js library. With this library, developers can add accurate Voice Search to their websites in just a few lines of code.

AssemblyAI Speech-to-Text API | Automated SRT and VTT Video Captions
Announcements
Automated SRT and VTT Video Captions (April 2020 Update)

Automated captions for videos in SRT and VTT format using AssemblyAI Speech-to-Text API.

PII Redaction for Speech-to-Text Transcriptions
Announcements
PII Redaction for Speech-to-Text Transcriptions (March 2020 Update)

AssemblyAI Speech-to-Text API launches PII Redaction for Transcriptions on audio, video, and podcasts.

ADVANCED TRANSCRIPTON FEATURES

Unlock your media with our advanced features like PII Redaction,
Keyword Boosts, Automatic Transcript Highlights, and more