Data retention and model training | AssemblyAI

Model training

We consider model training critical to providing you with the most accurate models and services that we can. Only certain files submitted to the API, as permitted by the applicable contract, are used for model training. These files undergo a redaction process designed to redact personally identifiable information before any remaining data is used for model training. We will not use files you submit for model training if you are subject to a Business Associate Addendum, are utilizing our European servers, or if you have opted out from model training. You can find more information on if and how to opt out here.

LLM Gateway model training

AssemblyAI has opted out of data training with all LLM Gateway providers.

Please note this is separate from whether AssemblyAI may train our models with your data. You can find more information on if and how to opt out of data sharing for our model improvement program here.

Encryption

Data at rest is encrypted with AES 128 or AES-256, and data in transit uses TLS 1.2+. AssemblyAI posts SSL scans quarterly to its Trust Center to verify the use of TLS with modern ciphersuites to its service.

Async

For transcription of pre-recorded audio, AssemblyAI supports the following TLS versions and cipher suites:

Supported TLS versions:

TLS 1.3
TLS 1.2

Supported cipher suites:

TLS_AES_128_GCM_SHA256
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
ECDHE-ECDSA-AES128-GCM-SHA256
ECDHE-RSA-AES128-GCM-SHA256
ECDHE-ECDSA-AES256-GCM-SHA384
ECDHE-RSA-AES256-GCM-SHA384

Streaming

For transcription of streaming audio, AssemblyAI supports the following TLS version and cipher suites:

Supported TLS versions:

TLS 1.3

Supported cipher suites:

TLS_AES_128_GCM_SHA256
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256

Ensure your client or application is configured to use one of the supported TLS versions and cipher suites when connecting to AssemblyAI services.

We have designed our products with GDPR principles top of mind but also understand that privacy compliance is a moving target. As privacy requirements continue to evolve (rapidly), we are constantly working to assess and improve our practices. You can read more about our privacy practices in our Privacy Policy here, and Data Processing Addendum here.

SOC2 certification

We have both SOC2 Type 1 and Type 2 certifications. You can find more information on this on our Trust Center. We also have a great blog post on the subject, which you can find here.

Data retention

Streaming production environment

If you are opted out of model training, we offer zero data retention of audio and transcripts for our Streaming product. Certain metadata about the transcript is stored and maintained for logging and billing purposes.

The model training environment differs from the production environment. You can find more information on model training in our Model Training section. If you would like to opt out of model training, please see our Opt-Out FAQ.

Asynchronous production environment

Artifact Type*	Time-To-Live Configured	BAA Executed	No TTL or BAA	Customer-Initiated Deletion Request
Customer-Uploaded Audio Files	Deleted at TTL expiration, which is 3 days by default.	Deleted at TTL expiration, which is 3 days by default.	Deletion process begins at 24 hours and is at most 48 hours.	Deleted when customer initiates deletion request.
Final Transcription Artifact	Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**	Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**	Indefinite	Deleted when customer initiates deletion request.
Customer-Provided URL Audio Reference	Linked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**	Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**	Linked to the lifecycle of a Final Transcription Artifact: Indefinite	Deleted when customer initiates deletion request.
Intermediate Artifact	Linked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**	Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**	Deletion process begins at 48 hours and is at most 72 hours.	Deleted when customer initiates deletion request.
Transcription Request Text Inputs (this includes content inputs such as key terms prompts, word boost list, etc.)	Linked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**	Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**	Linked to the lifecycle of a Final Transcription Artifact: Indefinite	Deleted when customer initiates deletion request.

*Certain metadata is stored for logging and billing purposes.

**The minimum TTL that AssemblyAI may set for Final Transcription Artifacts in the asynchronous production environment is 1 (one) hour. The TTL mechanism that AssemblyAI uses is through Amazon Web Services’s (“AWS”) DynamoDB TTL mechanism (the “AWS TTL”). The deletion process begins in AWS at TTL expiration, but is subject to AWS TTL processing times. In practice, these deletion events typically take place anywhere from a few minutes to a few hours after the deletion process begins in AWS, depending on circumstances, including server location. However, we have seen lag times anywhere from 2-3 hours to a few days. Once the artifact is deleted in AWS, AssemblyAI processes this deletion almost immediately. See here for more information about AWS’s TTL mechanism.

To learn how to delete a transcript via the API, see Delete transcripts.

Confirming deletion

Should you wish to confirm a file has been deleted, or in case you did not store the transcript_id when the transcription request was made, you can get a list of all transcripts. You can make a GET request to https://api.assemblyai.com/v2/transcript which will return a list of all transcripts created or specify a transcript_id to review a single transcript.

LLM Gateway production environment

If you have an executed BAA and use either Anthropic or Google inference models, we offer zero data retention for LLM Gateway inputs and outputs. Certain metadata is stored for logging and billing purposes.
If you have a designated TTL on your LLM Gateway account, we delete inputs and outputs on an hourly basis. Certain metadata is stored for logging and billing purposes.
If a customer initiates a deletion request, inputs and outputs are deleted at the time of the request. Certain metadata is stored for logging and billing purposes. For deletion of speech understanding requests, please see below.
For speech understanding requests, such as translation, speaker ID, or custom formatting, the retention is linked to the life of an asynchronous Final Transcription Artifact, noted above.

Provider-specific retention policies

Anthropic Claude models

We use Anthropic models through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.

If Amazon Bedrock fails, for non-EU customers we may send your request to the Anthropic API, where we have 0-day retention configured. Please see Anthropic’s commercial terms here.

OpenAI GPT models

For OpenAI models accessed via the OpenAI API (GPT-5.2, GPT-5.1, GPT-5, GPT-5 nano, GPT-5 mini, GPT-4.1), abuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.

For OpenAI open-weight models (gpt-oss-120b, gpt-oss-20b), we use these models through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.

Google Gemini models

Google Gemini models have ZDR (zero data retention). See Google’s policy here for more information on how Google defines ZDR.

AssemblyAI has opted out of model training with all LLM Gateway providers. Please note this is separate from whether AssemblyAI may train our models with your data, and the model training environment differs from the production environment. You can find more information on model training in our Model Training section. If you would like to opt out of model training, please see our Opt-Out FAQ.