May 10, 2021

Redacting Sensitive Medical Information from Transcriptions

AssemblyAI's Speech-to-Text API now supports automatically detecting and redacting medical information, like drug names, injuries, and medical conditions, from transcription text!

Andrew Galyan-Mann

Senior API Support Engineer

Andrew Galyan-Mann

Senior API Support Engineer

Reviewed by

No items found.

Table of contents

[Visible on live site]

Many of our customers have been making use of our advanced PII Detection & Redaction Feature, which can detect over 20 types of PII like credit card numbers, names, and dates of birth. Over the past few months, we've heard from many customers that operate products in industries where they need to ensure that private medical information remains confidential as well.

That's why today we are excited to release 5 new medical redaction policies to our PII Detection & Redaction Feature policies. These new policies are:

medical_process - Medical process, including treatments, procedures and test. E.g., "heart surgery", "CT scan."
medical_condition - A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression.
blood_type - A person's blood type.
drug - Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadol.
injury - Human injury, e.q., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages and dislocations.

For example, this is the redacted transcription for the below audio file:

"Hi. I'd like to schedule an appointment for an [MEDICAL_PROCESS]. The doctor wanted to take a look at the [INJURY] while working had fallen off of a roof that I was actually working on. He mentioned to go ahead and give you my blood type, which is [BLOOD_TYPE] for coming in and also request for a stronger prescription of [DRUG]. Currently I've just been taking some [DRUG] and [DRUG] [DRUG] and whenever laying around. So when you get a second, please call me back. We'll love to get that set up. Thank you. Bye."

Using these new medical redaction policies is very simple! Here's how you'd transcribe an audio file with these new redaction policies enabled in Python:

import requests endpoint = 'https://api.assemblyai.com/v2/transcript' headers = { 'authoriztion': 'YOUR-API-TOKEN', 'content-type': 'application/json', } json = { # The path to our audio file in an S3 or GCP bucket, for example. 'audio_url': 'https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/8-7-2018-post/7510.mp3', # This will tell the API we want a redacted transcript. 'redact_pii': True, # Setting this parameter tells the API to replace the redacted text with the category # of what is being redacted. This will make the transcript more readable! 'redact_pii_sub': 'entity_name', # Here we specify what we want to redact; in this case we are specifying the # new medical policies. 'redact_pii_policies': ['medical_process', 'medical_condition', 'blood_type', 'drug', 'injury'], } respone = requests.post(endpoint, json=json, headers=headers) print(response.json())

For more information, check out the PII Detection & Redaction Documentation, or send us an email at support@assemblyai.com!

Redacting Sensitive Medical Information from Transcriptions

How to convert voice to text in real time using JavaScript

How Siro reduced customer complaints and support tickets by 90% with AssemblyAI

18 Ways Businesses are Launching New Products with Speech AI

Winners and Honorable Mentions - AssemblyAI $50k Winter Hackathon

Redacting Sensitive Medical Information from Transcriptions

Related posts

How to convert voice to text in real time using JavaScript

How Siro reduced customer complaints and support tickets by 90% with AssemblyAI

18 Ways Businesses are Launching New Products with Speech AI

Winners and Honorable Mentions - AssemblyAI $50k Winter Hackathon