November 20, 2020

PII Redaction and Accuracy Improvements

Last month, we introduced PII Redaction Policies, as part of a big overhaul to our PII Redaction feature to make it more flexible and powerful for you to specify exactly what you want redacted from your transcript

Joe Zaghloul

Reviewed by

No items found.

Table of contents

[Visible on live site]

Last month, we introduced PII Redaction Policies, as part of a big overhaul to our PII Redaction feature to make it more flexible and powerful for you to specify exactly what you want redacted from your transcriptions.

Enhanced PII Redaction: More Policies and Customization

We’ve now expanded the list of Redaction Policies available to 20, with more on the way before the end of the year. Some of these new policies include credit_card_cvv, credit_card_expiration, organization, nationality, event, and location - the full list is shown below:

Using these new policies, you can take even greater control on how you safely redact the transcriptions produced by our API in order to comply with your, and your customers’, security standards.

Customize How Redacted PII is Replaced

By default, any PII that is detected is replaced with a hash - #. For example, the credit card number 1111-2222-3333-4444 is replaced with ####-####-####-####. To make the redaction more user-friendly and readable, the redacted text can now be replaced with the policy name. For example, the credit card number 1111-2222-3333-4444 is replaced with [CREDIT_CARD_NUMBER], and the social security number 111-11-1111 would be replaced with [US_SOCIAL_SECURITY_NUMBER].

When you have a lot of redaction policies enabled, this new feature maintains the readability of your transcriptions for your end-users compared to replacing all sensitive information with hash characters. To enable this new feature, you just have to include a new parameter in your POST request, redact_pii_sub.

How PII Redaction Works in AssemblyAI

Testing these policies using AssemblyAI’s API only takes a couple of minutes to setup. Using the code sample below, you can enable PII Redaction on your own audio or video files (for more code samples, check out our API Docs).

importrequests endpoint = "https://api.assemblyai.com/v2/transcript" json = { "audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav", "redact_pii": True, "redact_pii_policies": ["all"] } headers = { "authorization": "YOUR-API-TOKEN", "content-type": "application/json" } response = requests.post(endpoint, json=json, headers=headers)print(response.json())

PII Redaction for Audio Files

These same PII Redaction Policies also apply to audio redaction. We will mute the parts of your audio where PII is spoken, and make a downloadable URL available for the redacted audio file. To test audio redaction on your files, follow our guide here.

Improved Accuracy

We released another set of accuracy updates to our neural network - these include significant improvements to call, video, and podcast content. To help benchmark our current model’s accuracy, we’ve included a comparison versus common providers like Google Cloud’s Speech-to-Text (Premium Video Model) and AWS Transcribe below.

These sample video podcast transcripts (from Joe Rogan’s Podcast) are shown alongside the Word Error Rate which calculates the accuracy of automated speech recognition vs human transcription. As you can see in the table, we still consistently out-perform big tech providers like Google and AWS. Other providers like Microsoft and IBM were included in our analysis, but recorded the lowest accuracy.

AssemblyAI | WhatConverts Case Study

WhatConverts is a call tracking software (SaaS) that helps customers answer the question, “What marketing works?”

Their platform integrates with all marketing channels (e.g. Google Ads, Facebook Ads, Intercom, etc.) and makes it simple to understand which campaigns are working and which campaigns aren’t delivering leads. Leads are then prioritized, ranked, and managed within the WhatConverts software. Read the full case study here.

With the switch to AssemblyAI, they experienced a significant accuracy improvement, improved security - PII (PCI) Redaction, and more affordable pricing. WhatConverts also covered their switch to AssemblyAI, you can read their update on the WhatConverts Blog.

PII Redaction and Accuracy Improvements

Enhanced PII Redaction: More Policies and Customization

Customize How Redacted PII is Replaced

How PII Redaction Works in AssemblyAI

PII Redaction for Audio Files

Improved Accuracy

AssemblyAI | WhatConverts Case Study

How to evaluate Speech Recognition models

What to Know About Speech-to-Text Privacy

Speaker Diarization - Speaker Labels for Mono Channel Files

Why Language Models Became Large Language Models And The Hurdles In Developing LLM-based Applications

PII Redaction and Accuracy Improvements

Enhanced PII Redaction: More Policies and Customization

Customize How Redacted PII is Replaced

How PII Redaction Works in AssemblyAI

PII Redaction for Audio Files

Improved Accuracy

AssemblyAI | WhatConverts Case Study

Related posts

How to evaluate Speech Recognition models

What to Know About Speech-to-Text Privacy

Speaker Diarization - Speaker Labels for Mono Channel Files

Why Language Models Became Large Language Models And The Hurdles In Developing LLM-based Applications