Does AssemblyAI offer Zero Data Retention?

We do not have a formal Zero Data Retention policy (ZDR). If you are interested in achieving ZDR, please contact our Sales team.

AssemblyAI maintains customer data according to applicable law, customer contract, and data minimization principles. 

Files are maintained in our Production Environment according to the protocol below:

Audio & transcripts in production environment

Audio File LocationTranscript Deletion Request Submitted?BAA Signed with TTL Enabled?Audio Data RetentionTranscript Text Data Retention
Presigned URLYesYes OR NoDeleted as soon as deletion request occursDeleted as soon as deletion request occurs
Presigned URLNoYes3 days (due to BAA 3 days TTL schedule)3 days (due to BAA 3 days TTL schedule)
Presigned URLNoNo3 days (due to intermediate transcription artifacts deletion schedule)Indefinite
Upload via /upload endpointYesYes OR No2 days (due to upload endpoint deletion schedule)Deleted as soon as deletion request occurs
Upload via /upload endpointNoYes3 days (due to BAA 3 days TTL schedule)3 days (due to BAA 3 days TTL schedule)
Upload via /upload endpointNoNo3 days (due to intermediate transcription artifacts deletion schedule)Indefinite

Pre-recorded audio in production environment

This section applies only to the pre-recorded audio transcription endpoint: https://api.assemblyai.com/v2/transcript.

For pre-recorded audio, AssemblyAI can perform deletion at the customer’s request after a file has been transcribed by the API. Each file is provided with a unique identifier known as the transcript_id which can be stored by the customer and/or retrieved via a GET request.

Upon making a successful POST request for transcription to https://api.assemblyai.com/v2/transcript, the response will include a key id in the JSON response. This id is the unique identifier for the transcript, which can be used to retrieve a final transcript once the transcription job has been completed on the AssemblyAI side. Until a GET request is made to retrieve the final transcript, it is not recommended to run any deletion since you will have not yet been able to retrieve the result from the API.

Upon making a successful GET request to https://api.assemblyai.com/v2/transcript:transcript_id, the status key should be checked for the response as completed or error. If the status is not completed or error, the user should continue to poll for results to retrieve the transcript. Once a completed status is achieved, all outputs should be stored on the customer end in their own database for any record-keeping. If an error status is retrieved, the customer should check the error key of the response to diagnose what went wrong and re-run the file with the recommended changes.

Once a transcript has been retrieved via a completed status or has thrown an error status, it should now be deleted via a DEL request to https://api.assemblyai.com/v2/transcript:transcript_id. This will trigger a deletion job on AssemblyAI’s end. 

Regarding the deletion of audio data, if you used a presigned URL and host the audio file in your cloud environment, all audio data will also be deleted when the deletion request for a given transcript_id is submitted. If you used the upload endpoint for your audio file https://www.assemblyai.com/docs/api-reference/files/upload, the audio file data from this endpoint will be deleted on a schedule after 2 days. All intermediate audio transcription artifacts used for processing the file - transcoded audio, original audio files, etc. - will be deleted on a schedule after 3 days, unless a deletion request is submitted.

You can also reference the API documentation for the requests above:

Confirming Data Deletion: Pre-recorded audio

Should you wish to confirm a file has been deleted, or in case you did not store the transcript_id when the transcription request was made, you can get a list of all transcripts. You can make a GET request to https://api.assemblyai.com/v2/transcript, which will return a list of all transcripts created or specify a transcript_id to get a single transcript status. This will provide a full list of all transcripts associated with an account and their current status.

Streaming audio in production environment

This section applies only to the streaming audio transcription endpoint: wss://streaming.assemblyai.com/v3/ws.

If you are opted out of model training, we do not store or maintain any information about the audio streamed to us in our Streaming Product - it is processed for transcription and thrown away. Certain metadata about the transcript is stored and maintained for logging and billing purposes, but none of the original audio is stored.

Model training:

The model training environment differs from the production environment. You can find more information on model training in our Model Training FAQ. If you would like to opt out of model training, please see our Opt-Out FAQ.