Blog

PII Redaction and Accuracy Improvements

Changelog
Share on social icon.Share on social icon.Share on social icon.Share on social icon.

Last month, we introduced PII Redaction Policies, as part of a big overhaul to our PII Redaction feature to make it more flexible and powerful for you to specify exactly what you want redacted from your transcriptions.

Enhanced PII Redaction: More Policies and Customization

We’ve now expanded the list of Redaction Policies available to 20, with more on the way before the end of the year. Some of these new policies include credit_card_cvv, credit_card_expiration, organization, nationality, event, and location - the full list is shown below:

Redact_Personally_Identifiable_Information__PII__from_transcript_text___AssemblyAI.png

Using these new policies, you can take even greater control on how you safely redact the transcriptions produced by our API in order to comply with your, and your customers’, security standards.

Customize How Redacted PII is Replaced

By default, any PII that is detected is replaced with a hash - #. For example, the credit card number 1111-2222-3333-4444 is replaced with ####-####-####-####. To make the redaction more user-friendly and readable, the redacted text can now be replaced with the policy name. For example, the credit card number 1111-2222-3333-4444 is replaced with [CREDIT_CARD_NUMBER], and the social security number 111-11-1111 would be replaced with [US_SOCIAL_SECURITY_NUMBER].

When you have a lot of redaction policies enabled, this new feature maintains the readability of your transcriptions for your end-users compared to replacing all sensitive information with hash characters. To enable this new feature, you just have to include a new parameter in your POST request, redact_pii_sub.

How PII Redaction Works in AssemblyAI

Testing these policies using AssemblyAI’s API only takes a couple of minutes to setup. Using the code sample below, you can enable PII Redaction on your own audio or video files (for more code samples, check out our API Docs).

import requests

endpoint = "https://api.assemblyai.com/v2/transcript"

json = {
  "audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav",
  "redact_pii": True,
  "redact_pii_policies": ["all"]
}

headers = {
    "authorization": "YOUR-API-TOKEN",
    "content-type": "application/json"
}

response = requests.post(endpoint, json=json, headers=headers)

print(response.json())

PII Redaction for AudioFiles

These same PII Redaction Policies also apply to audio redaction. We will mute the parts of your audio where PII is spoken, and make a downloadable URL available for the redacted audio file. To test audio redaction on your files, follow our guide here.

Improved Accuracy

We released another set of accuracy updates to our neural network - these include significant improvements to call, video, and podcast content. To help benchmark our current model’s accuracy, we’ve included a comparison versus common providers like Google Cloud’s Speech-to-Text (Premium Video Model) and AWS Transcribe below. 

These sample video podcast transcripts (from Joe Rogan’s Podcast) are shown alongside the Word Error Rate which calculates the accuracy of automated speech recognition vs human transcription. As you can see in the table, we still consistently out-perform big tech providers like Google and AWS. Other providers like Microsoft and IBM were included in our analysis, but recorded the lowest accuracy. 

AssemblyAI | WhatConverts Case Study: Call Tracking Transcription

WhatConverts is a call tracking software (SaaS) that helps customers answer the question, “What marketing works?”. 

WhatConverts Call Tracking | AssemblyAI Speech-to-Text

Their platform integrates with all marketing channels (e.g. Google Ads, Facebook Ads, Intercom, etc.) and makes it simple to understand which campaigns are working and which campaigns aren’t delivering leads. Leads are then prioritized, ranked, and managed within the WhatConverts software. Read the full case study here.

With the switch to AssemblyAI, they experienced a significant accuracy improvement, improved security - PII (PCI) Redaction, and more affordable pricing. WhatConverts also covered their switch to AssemblyAI, you can read their update on the WhatConverts Blog.

Subscribe to our blog!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

You may also like

Checkout some of our recent research and product updates

How to Convert an MP3 File to Text with an API
Tutorials
How to Convert an MP3 File to Text with an API

how to convert an mp3 file to text with an API

Is Word Error Rate a Good Measure of Speech Recognition Systems?
Deep Learning
Is Word Error Rate a Good Measure of Speech Recognition Systems?

What is Word Error Rate? Word Error Rate is a measure of how accurate an Automatic Speech Recognition (ASR) system performs. Quite literally, it calculates how many “errors” are in the transcription text produced by an ASR system, compared to a human transcription.

The State of Python Speech Recognition in 2021
Tutorials
The State of Python Speech Recognition in 2021

The State of Python Speech Recognition in 2021

build with assemblyai

Accurately convert your audio and video files to text with AssemblyAI's Speech-to-Text API