Blog

PII Redaction for Speech-to-Text Transcriptions (March 2020 Update)

Product Updates
PII Redaction for Speech-to-Text Transcriptions

AssemblyAI's research team has launched a new neural network update, with big improvements in accuracy and speed. Below are some of the highlights that we're excited to share with you!

Further Accuracy Improvements

With our most recent update, we are consistently out-benchmarking all other API's in accuracy on our asynchronous English model. Below are our average accuracy %'s (using Word Error Rate or WER) based on benchmark reports run this month versus Google Cloud's video model, AWS Transcribe, and Microsoft Azure:

WER Results - March.png

Interested in benchmarking? Compare our accuracy and price side-by-side with your current provider by submitting a few of your files here.

Improved Security: PII redaction

Dealing with sensitive personal identifiable information (PII)?

PII redaction automatically detects and removes sensitive numbers, like credit card and social security numbers, from the transcription text. These sensitive numbers are replaced with pound signs "#" in the transcript.

For complete examples of how to turn on PII Redaction, take a look at the full API docs here.

Faster Transcription with Speed Boost

We work with a large number of telephony companies powering their visual voicemail. In visual voicemail applications, the turnaround time for transcriptions is key, so customers can see their visual voicemail shortly after their call ends.

To improve on this, we've added the speed boost feature which is built to transcribe 1 minute (or less) audio files in seconds. Your transcription will complete anywhere from 25-50% faster than normal for transcripts generated with this feature turned on.

Below is an example of how to turn on speed boost (Python example), for snippets of code in other languages, take a look at the full API docs here.

import requests

endpoint = "https://api.assemblyai.com/v2/transcript"

json = {
  "audio_url": "https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/8-7-2018-post/7510.mp3",
  "speed_boost": True
}

headers = {
    "authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}

response = requests.post(endpoint, json=json, headers=headers)

print(response.json())


100% Uptime

Another month of 100% uptime across all our models, subscribe to our status page to stay up-to-date!

Subscribe to our blog!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

You may also like

Checkout some of our recent research and product updates

Getting started with HttpClientFactory in C# and .NET 5
Developer
Getting started with HttpClientFactory in C# and .NET 5

HttpClientFactory has been around the .NET ecosystem for a few years now. In this post we will look at 3 basic implementations of HttpClientFactory; basic, named, and typed.

Feature Announcement: Content Safety Detection
Product Updates
Feature Announcement: Content Safety Detection is now GA!

Automatically transcribe audio and video files, and surface sensitive content, such "Hate Speech" or "NSFW" content, found within the audio.

Changelog: New Speaker Diarization model released
Changelog
Changelog: New Speaker Diarization model released

We have released a new Diarization model. Speaker diarization is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity.

ADVANCED TRANSCRIPTON FEATURES

Unlock your media with our advanced features like PII Redaction,
Keyword Boosts, Automatic Transcript Highlights, and more