Blog

Redacting Sensitive Medical Information from Transcriptions

Changelog
Share on social icon.Share on social icon.Share on social icon.Share on social icon.

Many of our customers have been making use of our advanced PII Detection & Redaction Feature, which can detect over 20 types of PII like credit card numbers, names, and dates of birth. Over the past few months, we've heard from many customers that operate products in industries where they need to ensure that private medical information remains confidential as well.

That's why today we are excited to release 5 new medical redaction policies to our PII Detection & Redaction Feature policies. These new policies are:

  • "medical_process" - Medical process, including treatments, procedures and test. E.g., "heart surgery", "CT scan."
  • "medical_condition" - A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression.
  • "blood_type" - A person's blood type.
  • "drug" - Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadol.
  • "injury" - Human injury, e.q., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages and disclocations.

For example, this is the redacted transcription for the below audio file:

"Hi. I'd like to schedule an appointment for an [MEDICAL_PROCESS]. The
doctor wanted to take a look at the [INJURY] while working had fallen off
of a roof that I was actually working on. He mentioned to go ahead and
give you my blood type, which is [BLOOD_TYPE] for coming in and also
request for a stronger prescription of [DRUG]. Currently I've just been
taking some [DRUG] and [DRUG] [DRUG] and whenever laying around. So when
you get a second, please call me back. We'll love to get that set up.
Thank you. Bye."

Using these new medical redaction policies is very simple! Here's how you'd transcribe an audio file with these new redaction policies enabled in Python:

import requests

endpoint = 'https://api.assemblyai.com/v2/transcript'

headers = {
    'authoriztion': 'YOUR-API-TOKEN',
    'content-type': 'application/json',
}

json = {
    # The path to our audio file in an S3 or GCP bucket, for example.
    'audio_url': 'https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/8-7-2018-post/7510.mp3',

    # This will tell the API we want a redacted transcript.
    'redact_pii': True,

    # Setting this parameter tells the API to replace the redacted text with the category
    # of what is being redacted. This will make the transcript more readable! 
    'redact_pii_sub': 'entity_name',

    # Here we specify what we want to redact; in this case we are specifying the
    # new medical policies.
    'redact_pii_policies': ['medical_process', 'medical_condition', 'blood_type', 'drug', 'injury'],
}

respone = requests.post(endpoint, json=json, headers=headers)

print(response.json())

For more information, check out the PII Detection & Redaction Documentation, or send us an email at support@assemblyai.com! 



Subscribe to our blog!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

You may also like

Checkout some of our recent research and product updates

How to Convert an MP3 File to Text with an API
Tutorials
How to Convert an MP3 File to Text with an API

how to convert an mp3 file to text with an API

Is Word Error Rate a Good Measure of Speech Recognition Systems?
Deep Learning
Is Word Error Rate a Good Measure of Speech Recognition Systems?

What is Word Error Rate? Word Error Rate is a measure of how accurate an Automatic Speech Recognition (ASR) system performs. Quite literally, it calculates how many “errors” are in the transcription text produced by an ASR system, compared to a human transcription.

The State of Python Speech Recognition in 2021
Tutorials
The State of Python Speech Recognition in 2021

The State of Python Speech Recognition in 2021

build with assemblyai

Accurately convert your audio and video files to text with AssemblyAI's Speech-to-Text API