PII Redaction | AssemblyAI

Overview

Streaming PII Redaction lets you automatically detect and remove personally identifiable information from your streaming transcripts in real time. When enabled, the API redacts PII in final turns only before sending them to the client.

PII redaction supports all streaming models: u3-rt-pro, universal-streaming-english, and universal-streaming-multilingual.

Final turns only

PII redaction only applies to final turns. When redact_pii is true, include_partial_turns defaults to false automatically so no unredacted text reaches the client. Only set include_partial_turns to true if you explicitly want partial (non-final) turns, which will contain unredacted PII alongside the redacted final turns.

When you enable PII redaction, your final turns will look like this:

With hash substitution: Hi, my name is ####!
With entity_name substitution: Hi, my name is [PERSON_NAME]!

Pre-recorded PII redaction

For PII redaction on pre-recorded audio, including redacted audio file generation, see PII Redaction.

Connection parameters

Parameter	Type	Required	Default	Description
`redact_pii`	boolean	Yes	`false`	Enable PII text redaction. Only applies to final turns.
`redact_pii_policies`	string \| array	No	All	PII entity types to redact. Pass a comma-separated string over the raw WebSocket (e.g. `person_name,phone_number`) or an array via the SDK. If omitted and `redact_pii` is `true`, all detected PII is redacted. See PII policies for the full list.
`redact_pii_sub`	string	No	`hash`	Replacement scheme. `hash` replaces PII with `#` characters, `entity_name` replaces with `[ENTITY_TYPE]`.
`include_partial_turns`	boolean	No	`false` when `redact_pii` is `true`, otherwise `true`	Whether to include partial (non-final) turns. Defaults to `false` automatically when PII redaction is enabled, so no unredacted text reaches the client. Set to `true` only if you explicitly want to receive partial turns, which will contain unredacted PII.

Quickstart

Get started with streaming PII redaction using the code below. This example streams audio from your microphone and prints each turn with PII redacted.

Python

Python SDK

JavaScript

Install the required libraries

$ pip install websocket-client pyaudio

Create a new file main.py and paste the code below. Replace <YOUR_API_KEY> with your API key.

Run with python main.py and speak into your microphone.

1 import pyaudio
2 import websocket
3 import json
4 import threading
5 import time
6 from urllib.parse import urlencode
7 
8 YOUR_API_KEY = "<YOUR_API_KEY>"
9 CONNECTION_PARAMS = {
10     "sample_rate": 16000,
11     "speech_model": "u3-rt-pro",
12     "format_turns": "true",
13     "redact_pii": "true",
14     "redact_pii_policies": "person_name,phone_number,email_address",
15     "redact_pii_sub": "entity_name",
16 }
17 API_ENDPOINT_BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
18 API_ENDPOINT = f"{API_ENDPOINT_BASE_URL}?{urlencode(CONNECTION_PARAMS)}"
19 
20 FRAMES_PER_BUFFER = 800
21 SAMPLE_RATE = CONNECTION_PARAMS["sample_rate"]
22 CHANNELS = 1
23 FORMAT = pyaudio.paInt16
24 
25 audio = None
26 stream = None
27 ws_app = None
28 audio_thread = None
29 stop_event = threading.Event()
30 
31 def on_open(ws):
32     print("WebSocket connection opened.")
33 
34     def stream_audio():
35         global stream
36         while not stop_event.is_set():
37             try:
38                 audio_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
39                 ws.send(audio_data, websocket.ABNF.OPCODE_BINARY)
40             except Exception as e:
41                 print(f"Error streaming audio: {e}")
42                 break
43 
44     global audio_thread
45     audio_thread = threading.Thread(target=stream_audio)
46     audio_thread.daemon = True
47     audio_thread.start()
48 
49 def on_message(ws, message):
50     try:
51         data = json.loads(message)
52         msg_type = data.get("type")
53         if msg_type == "Begin":
54             print(f"Session began: ID={data.get('id')}")
55         elif msg_type == "Turn":
56             transcript = data.get("transcript", "")
57             end_of_turn = data.get("end_of_turn", False)
58             if end_of_turn:
59                 print(f"\r{' ' * 80}\r{transcript}")
60         elif msg_type == "Termination":
61             print(f"\nSession terminated: {data.get('audio_duration_seconds', 0)}s of audio")
62     except Exception as e:
63         print(f"Error handling message: {e}")
64 
65 def on_error(ws, error):
66     print(f"\nWebSocket Error: {error}")
67     stop_event.set()
68 
69 def on_close(ws, close_status_code, close_msg):
70     print(f"\nWebSocket Disconnected: Status={close_status_code}")
71     global stream, audio
72     stop_event.set()
73     if stream:
74         if stream.is_active():
75             stream.stop_stream()
76         stream.close()
77     if audio:
78         audio.terminate()
79 
80 def run():
81     global audio, stream, ws_app
82     audio = pyaudio.PyAudio()
83     stream = audio.open(
84         input=True,
85         frames_per_buffer=FRAMES_PER_BUFFER,
86         channels=CHANNELS,
87         format=FORMAT,
88         rate=SAMPLE_RATE,
89     )
90     print("Speak into your microphone. Press Ctrl+C to stop.")
91     ws_app = websocket.WebSocketApp(
92         API_ENDPOINT,
93         header={"Authorization": YOUR_API_KEY},
94         on_open=on_open,
95         on_message=on_message,
96         on_error=on_error,
97         on_close=on_close,
98     )
99     ws_thread = threading.Thread(target=ws_app.run_forever)
100     ws_thread.daemon = True
101     ws_thread.start()
102     try:
103         while ws_thread.is_alive():
104             time.sleep(0.1)
105     except KeyboardInterrupt:
106         print("\nStopping...")
107         stop_event.set()
108         if ws_app and ws_app.sock and ws_app.sock.connected:
109             ws_app.send(json.dumps({"type": "Terminate"}))
110             time.sleep(2)
111         if ws_app:
112             ws_app.close()
113         ws_thread.join(timeout=2.0)
114 
115 if __name__ == "__main__":
116     run()

Example output

With entity_name substitution:

1 Hi, my name is [PERSON_NAME] and you can reach me at [PHONE_NUMBER] or [EMAIL_ADDRESS].

With hash substitution:

1 Hi, my name is #### and you can reach me at ###-###-#### or ####@#####.###.

Supported PII policies

Streaming PII redaction supports the same policies as pre-recorded PII redaction, including person_name, phone_number, email_address, credit_card_number, us_social_security_number, date_of_birth, and more.

For the full list of available policies, see PII policies.

Troubleshooting

Why am I still seeing PII in the transcript?

PII redaction only applies to final turns. If you’re seeing PII, you likely set include_partial_turns to true, which returns unredacted partial turns alongside redacted finals. Remove that override (or set it to false) to only receive redacted final turns — this is the default when redact_pii is enabled.

Can I redact PII from the audio itself?

Audio redaction is not available for streaming. To generate a redacted audio file, use pre-recorded PII redaction with the redact_pii_audio parameter.