What are my concurrency limits? | AssemblyAI

A concurrency limit is the number of requests that can be processed for an account at any given time.

Each account has an assigned concurrency limit (also referred to as a throttle limit). For free accounts, the default is 5 for Asynchronous and Streaming transcriptions. For upgraded accounts, the default is 200 for Asynchronous transcriptions and 100 for Streaming transcriptions.

You can check your concurrency limits on your dashboard.

Asynchronous speech-to-text limits

Below are the default limits for how many requests you can have processing in parallel when submitting jobs to our /v2/transcript endpoint.

Account type	Concurrency limit
Free	5
Paid	200+

Streaming speech-to-text limits

AssemblyAI’s Universal-Streaming API features unlimited, automatic scaling concurrency limits for paid accounts that are dynamic based on usage. We do not limit the total number of concurrent streaming sessions. Instead, there is only a limit on the number of new streaming sessions that can be created per minute. These limits start at:

Account type	Starting rate limit (new sessions/min)
Free	5
Paid	100+

Anytime you are utilizing 70% or more of your current limit, the number of new streams able to be opened over the next minute will automatically increase by 10%. Assuming you follow this pattern minute-over-minute maxing out your available new sessions rate limit for 5 minutes (opening 100, then 110, then 121, then 133, then 146 new streams each minute), you’d have 610 total concurrent streams. Over the next 60 seconds, a maximum of 161 new streams would be able to be opened.

You can find more information on Concurrency Limits in our Documentation here!

Need a higher concurrency?

We offer custom concurrency limits that scale to support any workload at no additional cost. If you need a higher concurrency limit please either contact our Sales team or reach out to us at support@assemblyai.com.

Rate Limits

In addition to the concurrency limit, there’s a rate limit for the API, which restricts users to a maximum of 20,000 requests per five minutes.