New Punctuation and Casing model released

2023 Update: See here for AssemblyAI's most recent Punctuation and Casing model update (2023).

We’ve been hard at work at AssemblyAI dramatically improving our speech-to-text features and this week sees an update to our punctuation and casing features with a brand new model.

A model in AI or machine learning is a mathematical algorithm that replicates a decision process to enable automation and understanding.

At AssemblyAI, our Speech-to-Text models output transcripts in their raw form, for example: “my name is bob and i am seventy eight years old”. This raw text is then sent through a second model, called a Punctuation Restoration model, which adds punctuation and casing to that raw text.

Our Punctuation Restoration model is a multi-class classifier under the hood. For each word, we are basically predicting a class that denotes either one or two actions: do nothing (i.e. leave the word as-is), or add punctuation and/or casing to the word. For example, the model might predict that we should uppercase a word, and add a comma to the end of it. Or, the model might predict that the word should stay lowercase, but a period should be added to the end of it.

After getting the predictions for each word, we end up with a well-punctuated transcription with proper casing!

Our current best model leverages a transformer-based model architecture, which produces superior results compared to RNN based models. This model was trained on over 1 billion tokens, and yields an accuracy of over 92% for punctuation and casing restoration!

For more information on the above transformer-based model architectures, check out this whitepaper.

Examples

Take the following text example. This text block lacks any punctuation or casing.‍

hi how are you good how are you i'm good i am just enjoying the weather oh nice where do you live i live in sf i work for a company called assemblyai oh nice is your whole team in sf with you no actually we are a remote team so we have people everywhere in the us and in europe as well wow that's cool i used to work at aws and we were also a remote team nice what did you have for lunch today i had a hot dog was it good yes

Once this is passed through the new model, we get the following output:

Hi. How are you? Good. How are you? I'm good. I am just enjoying the weather. Oh, nice. Where do you live? I live in SF. I work for a company called AssemblyAI. Oh, nice. Is your whole team in SF with you? No, actually, we are a remote team. So we have people everywhere in the us and in Europe as well. Wow, that's cool. I used to work at AWS, and we were also a remote team. Nice. What did you have for lunch today? I had a hot dog. Was it good? Yes.

As you can see the model correctly interprets the different cadences within the speech, adding correct punctuation and casing. It will even correctly case rare words or business names like AssemblyAI and AWS.

Industry Specific Language

Even with very industry-specific language, the model performs exceptionally well as seen in the example below.

all our statements are made as of today february 24 2021 based on information currently available to us except as required by law we assume no obligation to update any such statements during this call we will discuss non gaap financial measures you can find a reconciliation of these non gaap financial measures to gaap financial measures in our cfo commentary which is posted on our website during this call we may make forward looking statements based on current expectations

Once punctuation and casing is applied:

All our statements are made as of today February 24, 2021 based on information currently available to us except as required by law. We assume no obligation to update any such statements during this call. We will discuss non GAAP financial measures. You can find a reconciliation of these non GAAP financial measures to GAAP financial measures in our CFO commentary, which is posted on our website. During this call, we may make forward looking statements based on current expectations.

Punctuation and casing are applied by default to all API requests, so you don’t even need to do anything to utilize this fantastic new model!

Get started with AssemblyAI speech-to-text transcriptions by signing up for a free account here.‍

New Punctuation and Casing Model Released

Examples

Industry Specific Language

Popular posts

AI trends in 2024: Graph Neural Networks

AI for Universal Audio Understanding: Qwen-Audio Explained

Combining Speech Recognition and Diarization in one model

How DALL-E 2 Actually Works