How long does AssemblyAI retain data in the Production Environment?

Streaming production environment

If you are opted out of model training, we offer zero data retention of audio and transcripts for our Streaming product. Certain metadata about the transcript is stored and maintained for logging and billing purposes.

The model training environment differs from the production environment. You can find more information on model training in our Model Training FAQ. If you would like to opt out of model training, please see our Opt-Out FAQ.

Asynchronous production environment

Artifact Type*Time-To-Live ConfiguredBAA ExecutedNo TTL or BAACustomer-Initiated Deletion Request
Customer-Uploaded Audio FilesDeletion process begins at 24 hours and is at most 48 hours.Deletion process begins at 24 hours and is at most 48 hours.Deletion process begins at 24 hours and is at most 48 hours.Deletion process begins at 24 hours and is at most 48 hours.
Final Transcription ArtifactDeletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**IndefiniteDeleted when customer initiates deletion request.
Customer-Provided URL Audio ReferenceLinked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**Linked to the lifecycle of a Final Transcription Artifact: IndefiniteDeleted when customer initiates deletion request.
Intermediate ArtifactLinked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**Deletion process begins at 48 hours and is at most 72 hours.Deleted when customer initiates deletion request.
Transcription Request Text Inputs (this includes content inputs such as key terms prompts, word boost list, etc.)Linked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**Linked to the lifecycle of a Final Transcription Artifact: IndefiniteDeleted when customer initiates deletion request.

*Certain metadata is stored for logging and billing purposes.

**The minimum TTL that AssemblyAI may set for Final Transcription Artifacts in the asynchronous production environment is 1 (one) hour. The TTL mechanism that AssemblyAI uses is through Amazon Web Services’s (“AWS”) DynamoDB TTL mechanism (the “AWS TTL”). The deletion process begins in AWS at TTL expiration, but is subject to AWS TTL processing times. In practice, these deletion events typically take place anywhere from a few minutes to a few hours after the deletion process begins in AWS, depending on circumstances, including server location. However, we have seen lag times anywhere from 2-3 hours to a few days. Once the artifact is deleted in AWS, AssemblyAI processes this deletion almost immediately. See here for more information about AWS’s TTL mechanism.

Confirming deletion

Should you wish to confirm a file has been deleted, or in case you did not store the transcript_id when the transcription request was made, you can get a list of all transcripts. You can make a GET request to https://api.assemblyai.com/v2/transcript which will return a list of all transcripts created or specify a transcript_id to review a single transcript.

The model training environment differs from the production environment. You can find more information on model training in our Model Training FAQ. If you would like to opt out of model training, please see our Opt-Out FAQ.

LLM Gateway production environment

  • If you have an executed BAA and use either Anthropic or Google inference models, we offer zero data retention for LLM Gateway inputs and outputs. Certain metadata is stored for logging and billing purposes.

  • If you have a designated TTL on your LLM Gateway account, we delete inputs and outputs on an hourly basis. Certain metadata is stored for logging and billing purposes.

  • If a customer initiates a deletion request, inputs and outputs are deleted at the time of the request. Certain metadata is stored for logging and billing purposes. For deletion of speech understanding requests, please see below.

  • For speech understanding requests, such as translation, speaker ID, or custom formatting, the retention is linked to the life of an asynchronous Final Transcription Artifact, noted above.

For more information about how Anthropic, Google, and OpenAI retain data, please refer to the Available Models charts in our LLM Gateway Overview.

AssemblyAI has opted out of model training with all LLM Gateway providers. Please note this is separate from whether AssemblyAI may train our models with your data, and the model training environment differs from the production environment. You can find more information on model training in our Model Training FAQ. If you would like to opt out of model training, please see our Opt-Out FAQ.