PII Redaction
Supported languages
enen_auen_uken_usesfrdeitptnlhijazhfikoplrutrukviafarbebgmycahrcsdaetkaelhehuisidkmlvltlbmsnofaroskslswsvtltaSupported models
slam-1universalSupported regions
The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.
Personal Identifiable Information (PII) is any information that can be used to identify a person, such as a name, email address, or phone number.
When you enable the PII Redaction model, your transcript will look like this:
- With
hashsubstitution:Hi, my name is ####! - With
entity_namesubstitution:Hi, my name is [PERSON_NAME]!
You can also Create redacted audio files to replace sensitive information with a beeping sound.
Redacted Properties
PII only redacts words in the text property. Properties from other features
may still include PII, such as entities from Entity
Detection or summary from
Summarization.
Quickstart
Python SDK
Python
JavaScript SDK
JavaScript
C#
Ruby
PHP
Enable PII Redaction on the TranscriptionConfig using the set_redact_pii()
method.
Set policies to specify the information you want to redact. For the full list of policies, see PII policies.
Example output
PII Redaction Using LeMUR
If you would like the option to use LeMUR for custom PII redaction, check out this guide Redact PII from Text Using LeMUR.
Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII “beeped” out.
Python SDK
Python
JavaScript SDK
JavaScript
C#
Ruby
PHP
To create a redacted version of the audio file, use the set_redact_pii() method on the TranscriptionConfig with redact_audio to True.
Use get_redacted_audio_url() on the transcript to get the URL to the redacted audio file.
Maximum Audio File Size
You can only create redacted versions of audio files if the original file is smaller than 1 GB.
Redacted Audio for Silent Files
By default, audio redaction provides redacted audio URLs only when speech is detected. However, if your use-case specifically requires redacted audio files even for silent audio files without any dialogue, you can now opt to receive these URLs. Enable this by setting the optional parameter "return_redacted_no_speech_audio": true within redact_pii_audio_options in your POST request body.
Example request body
Example output
API reference
Request
Response
The response also includes the request parameters used to generate the transcript.
Request for Redacted Audio
In the request URL, replace transcript_id with the ID of the transcript where redact_pii_audio is set to true.
Response for Redacted Audio
PII policies
Troubleshooting
Why is the PII not redacted in my transcription?
Make sure that at least one PII policy has been specified in
your request, using the redact_pii_policies parameter. If you’re still
experiencing issues, please reach out to our support team for assistance.
Why is my webhook not being sent?
There could be several reasons why your webhook isn’t being sent, such as a
misconfigured URL, an unreachable endpoint, or an issue with the
authentication headers. Double-check your request and ensure that the
webhook_url parameter is included with a valid URL that can be reached by
AssemblyAI’s API. If you’re using custom authentication headers, ensure that
the webhook_auth_header_name and webhook_auth_header_value parameters are
included and are correct. If you’re still having issues, please contact our
support team for assistance.
Why does my redacted audio file sound worse than the original?
By default, the API returns redacted audio files in MP3 format, a lossy
format. Lossy formats remove audio information to reduce file size, which may
cause a reduction in quality. The difference may be particularly noticeable if
the submitted audio is in a lossless file format. To retain as much quality as
possible, you can instead return your redacted audio files in a lossless
format, by setting redact_pii_audio_quality to wav.