Blog

Redacting Sensitive Medical Information from Transcriptions

Changelog
Redacting Sensitive Medical Information from Transcriptions
Share on social icon.Share on social icon.Share on social icon.Share on social icon.

Many of our customers have been making use of our advanced PII Detection & Redaction Feature, which can detect over 20 types of PII like credit card numbers, names, and dates of birth. Over the past few months, we've heard from many customers that operate products in industries where they need to ensure that private medical information remains confidential as well.

That's why today we are excited to release 5 new medical redaction policies to our PII Detection & Redaction Feature policies. These new policies are:

  • "medical_process" - Medical process, including treatments, procedures and test. E.g., "heart surgery", "CT scan."
  • "medical_condition" - A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression.
  • "blood_type" - A person's blood type.
  • "drug" - Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadol.
  • "injury" - Human injury, e.q., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages and disclocations.

For example, this is the redacted transcription for the below audio file:

"Hi. I'd like to schedule an appointment for an [MEDICAL_PROCESS]. The
doctor wanted to take a look at the [INJURY] while working had fallen off
of a roof that I was actually working on. He mentioned to go ahead and
give you my blood type, which is [BLOOD_TYPE] for coming in and also
request for a stronger prescription of [DRUG]. Currently I've just been
taking some [DRUG] and [DRUG] [DRUG] and whenever laying around. So when
you get a second, please call me back. We'll love to get that set up.
Thank you. Bye."

Using these new medical redaction policies is very simple! Here's how you'd transcribe an audio file with these new redaction policies enabled in Python:

import requests

endpoint = 'https://api.assemblyai.com/v2/transcript'

headers = {
    'authoriztion': 'YOUR-API-TOKEN',
    'content-type': 'application/json',
}

json = {
    # The path to our audio file in an S3 or GCP bucket, for example.
    'audio_url': 'https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/8-7-2018-post/7510.mp3',

    # This will tell the API we want a redacted transcript.
    'redact_pii': True,

    # Setting this parameter tells the API to replace the redacted text with the category
    # of what is being redacted. This will make the transcript more readable! 
    'redact_pii_sub': 'entity_name',

    # Here we specify what we want to redact; in this case we are specifying the
    # new medical policies.
    'redact_pii_policies': ['medical_process', 'medical_condition', 'blood_type', 'drug', 'injury'],
}

respone = requests.post(endpoint, json=json, headers=headers)

print(response.json())

For more information, check out the PII Detection & Redaction Documentation, or send us an email at support@assemblyai.com! 



Subscribe to our blog!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

You may also like

Checkout some of our recent research and product updates

Fine-Tuning Transformers for NLP
Deep Learning
Tutorials
Fine-Tuning Transformers for NLP

Since being first developed and released in the Attention Is All You Need paper Transformers have completely redefined the field of Natural Language Processing. In this blog, we show you how to quickly fine-tune Transformers for numerous downstream tasks, that often perform really well out of the box.

Transcribing Zoom Recordings Using the Zoom API and AssemblyAI
Tutorials
Transcribing Zoom Recordings Using the Zoom API and AssemblyAI

In this post, we’re going to show you how to transcribe your Zoom recordings by connecting Zoom’s API with AssemblyAI’s automatic speech recognition API. In just a few lines of code, you'll see how you can accurately transcribe your Zoom recordings!

Open Sourcing Drone Deploy ECS
Engineering
Open Sourcing Drone Deploy ECS

We’re excited to announce that we’ve open sourced another project! `drone-deploy-ecs` is a Drone plugin that enables you to deploy updates to ECS. Our engineering team has recently made the decision to migrate from Docker on EC2 to AWS ECS. We knew that moving to ECS would require us to refactor our deployment processes, so we figured we’d wrap our deployment process into a single tool that fit into our CICD solution.

ADVANCED TRANSCRIPTON FEATURES

Unlock your media with our advanced features like PII Redaction,
Keyword Boosts, Automatic Transcript Highlights, and more