PII Redaction
The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.
Personal Identifiable Information (PII) is any information that can be used to identify a person, such as a name, email address, or phone number.
When you enable the PII Redaction model, your transcript will look like this:
- With
hash
substitution:Hi, my name is ####!
- With
entity_name
substitution:Hi, my name is [PERSON_NAME]!
You can also Create redacted audio files to replace sensitive information with a beeping sound.
Quickstart
Enable PII Redaction by setting redact_pii
to true
in the transcription
config.
Use redact_pii_policies
to specify the information you want to
redact. For the full list of policies, check out PII policies.
Example output
Smoke from hundreds of wildfires in Canada is triggering air quality alerts
throughout the US. Skylines from Maine to Maryland to Minnesota are gray and
smoggy. And in some places, the air quality warnings include the warning to stay
inside. We wanted to better understand what's happening here and why, so we
called ##### #######, an ######### ######### in the ########## ## #############
###### ### ########### at ##### ####### ##########. Good morning, #########.
Good morning. So what is it about the conditions right now that have caused this
round of wildfires to affect so many people so far away? Well, there's a couple
of things. The season has been pretty dry already, and then the fact that we're
getting hit in the US. Is because there's a couple of weather systems that ...
Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII "beeped" out.
Example output
https://s3.us-west-2.amazonaws.com/api.assembly.ai.usw2/redacted-audio/ac06721c-d1ea-41a7-95f7-a9463421e6b1.mp3?AWSAccessKeyId=...
API reference
Request
curl https://api.assemblyai.com/v2/transcript \
--header "Authorization: <YOUR_API_KEY>" \
--header "Content-Type: application/json" \
--data '{
"audio_url": "YOUR_AUDIO_URL",
"redact_pii": true,
"redact_pii_policies": ["us_social_security_number", "credit_card_number"],
"redact_pii_sub": "hash",
"redact_pii_audio": true,
"redact_pii_audio_quality": "mp3"
}'
Key | Type | Description |
---|---|---|
redact_pii | boolean | Enable PII Redaction. |
redact_pii_policies | array | PII policies for what information to redact. |
redact_pii_sub | string | Method used to substitute PII in the transcript. Can be entity_name or hash . |
redact_pii_audio | boolean | Create a redacted version of the audio file. |
redact_pii_audio_quality | string | Quality of the redacted PII audio file. Can be mp3 or wav . |
Response
Key | Type | Description |
---|---|---|
text | string | Transcript with redacted PII. |
The response also includes the request parameters used to generate the transcript.
PII policies
medical_process | Medical process, including treatments, procedures, and tests. | heart surgery, CT scan |
medical_condition | Name of a medical condition, disease, syndrome, deficit, or disorder. | chronic fatigue syndrome, arrhythmia, depression. |
blood_type | Blood type. | O-, AB positive |
drug | Medications, vitamins, or supplements. | Advil, Acetaminophen, Panadol |
injury | Bodily injury. | I broke my arm, I have a sprained wrist |
number_sequence | A "lazy" rule that redacts any sequence of numbers equal to or greater than 2. | |
email_address | Email address. | support@assemblyai.com |
date_of_birth | Date of birth. | Date of Birth: March 7,1961 |
phone_number | Telephone or fax number. | |
us_social_security_number | Social Security Number or equivalent. | |
credit_card_number | Credit card number. | |
credit_card_expiration | Expiration date of a credit card. | |
credit_card_cvv | Credit card verification code. | CVV: 080 |
date | Specific calendar date. | December 18 |
nationality | Terms indicating nationality, ethnicity, or race. | American, Asian, Caucasian |
event | Name of an event or holiday. | Olympics, Yom Kippur |
language | Name of a natural language. | Spanish, French |
location | Any Location reference including mailing address, postal code, city, state, province, or country. | |
money_amount | Name and/or amount of currency. | 15 pesos, $94.50 |
person_name | Name of a person. | Bob, Doug Jones |
person_age | Number associated with an age. | 27, 75 |
organization | Name of an organization. | CNN, McDonalds, University of Alaska |
political_affiliation | Terms referring to a political party, movement, or ideology. | Republican, Liberal |
occupation | Job title or profession. | professor, actors, engineer, CPA |
religion | Terms indicating religious affiliation. | Hindu, Catholic |
drivers_license | Driver’s license number. | DL# 356933-540 |
banking_information | Banking information, including account and routing numbers. |
Troubleshooting
redact_pii_policies
parameter is included in your request with the desired policy names. If you're still experiencing issues, please reach out to our support team for assistance.webhook_url
parameter is included with a valid URL that can be reached by AssemblyAI's API. If you're using custom authentication headers, ensure that the webhook_auth_header_name
and webhook_auth_header_value
parameters are included and are correct. If you're still having issues, please contact our support team for assistance.