How long does AssemblyAI retain data in the Production Environment?
Streaming production environment
If you are opted out of model training, we offer zero data retention of audio and transcripts for our Streaming product. Certain metadata about the transcript is stored and maintained for logging and billing purposes.
The model training environment differs from the production environment. You can find more information on model training in our Model Training FAQ. If you would like to opt out of model training, please see our Opt-Out FAQ.
Asynchronous production environment
*Certain metadata is stored for logging and billing purposes.
**The minimum TTL that AssemblyAI may set for Final Transcription Artifacts in the asynchronous production environment is 1 (one) hour. The TTL mechanism that AssemblyAI uses is through Amazon Web Services’s (“AWS”) DynamoDB TTL mechanism (the “AWS TTL”). The deletion process begins in AWS at TTL expiration, but is subject to AWS TTL processing times. In practice, these deletion events typically take place anywhere from a few minutes to a few hours after the deletion process begins in AWS, depending on circumstances, including server location. However, we have seen lag times anywhere from 2-3 hours to a few days. Once the artifact is deleted in AWS, AssemblyAI processes this deletion almost immediately. See here for more information about AWS’s TTL mechanism.
Confirming deletion
Should you wish to confirm a file has been deleted, or in case you did not store the transcript_id when the transcription request was made, you can get a list of all transcripts. You can make a GET request to https://api.assemblyai.com/v2/transcript which will return a list of all transcripts created or specify a transcript_id to review a single transcript.
The model training environment differs from the production environment. You can find more information on model training in our Model Training FAQ. If you would like to opt out of model training, please see our Opt-Out FAQ.
LLM Gateway production environment
-
If you have an executed BAA and use either Anthropic or Google inference models, we offer zero data retention for LLM Gateway inputs and outputs. Certain metadata is stored for logging and billing purposes.
-
If you have a designated TTL on your LLM Gateway account, we delete inputs and outputs on an hourly basis. Certain metadata is stored for logging and billing purposes.
-
If a customer initiates a deletion request, inputs and outputs are deleted at the time of the request. Certain metadata is stored for logging and billing purposes. For deletion of speech understanding requests, please see below.
-
For speech understanding requests, such as translation, speaker ID, or custom formatting, the retention is linked to the life of an asynchronous Final Transcription Artifact, noted above.
For more information about how Anthropic, Google, and OpenAI retain data, please refer to the Available Models charts in our LLM Gateway Overview.
AssemblyAI has opted out of model training with all LLM Gateway providers. Please note this is separate from whether AssemblyAI may train our models with your data, and the model training environment differs from the production environment. You can find more information on model training in our Model Training FAQ. If you would like to opt out of model training, please see our Opt-Out FAQ.