Changelog
Follow along to see weekly accuracy and product improvements.
Multichannel Speaker Diarization
We've added support for multichannel speaker diarization with pre-recorded transcription, allowing you to identify individual speakers across multiple audio channels in a single API request.
This unlocks accurate transcription for complex audio scenarios like hybrid meetings, call center recordings with supervisor monitoring, or podcast recordings with multiple mics. Speaker labels are formatted as 1A, 1B, 2A, 2B, where the first digit indicates the channel and the letter identifies unique speakers within that channel. For example, in a meeting where Channel 1 captures an in-room conversation between two people and Channel 2 captures a remote participant, you'll get clear attribution for all three speakers even though Channel 1 contains multiple talkers.
How to use it:
- Set both
multichannel=trueandspeaker_labels=truein your transcription request—no other changes needed - Available now for all Universal customers across all plan tiers
- View documentation
Universal delivers industry-leading accuracy with advanced features like multichannel support and speaker diarization, giving you the precision and flexibility needed to build production-grade voice AI applications.
Gemini 3 Flash Preview now supported on LLM Gateway
Google's newest Gemini 3 Flash Preview model is live in the LLM Gateway.
This model delivers faster inference speeds with improved reasoning capabilities compared to previous Flash versions. Gemini 3 Flash Preview excels at high-throughput applications requiring quick response times—like real-time customer support agents, content moderation, and rapid document processing—while maintaining strong accuracy on complex queries that would have required slower, more expensive models.
For more information, check out our docs here.
Improved File Deletion for Enhanced Data Privacy
We've updated how uploaded audio files are deleted when you delete a transcript, giving you immediate control over your data.
Previously, when you made a DELETE request to remove a transcript, the associated uploaded file would remain in storage for up to 24 hours before automatic deletion. Now, uploaded files are immediately deleted alongside the transcript when you make a DELETE request, ensuring your data is removed from our systems right away.
This change applies specifically to files uploaded via the /upload endpoint. If you're reusing upload URLs across multiple transcription requests, note that deleting one transcript will now immediately invalidate that upload URL for any subsequent requests.
How it works:
- When you send a
DELETErequest to remove a transcript, any file uploaded via/uploadand associated with that transcript is now deleted immediately - This applies to all customers using the
/uploadendpoint across all plans - If you need to transcribe the same file multiple times, upload it separately for each request or retain the original file on your end
AssemblyAI's APIs are built with security and data privacy as core principles. Our speech-to-text and audio intelligence models process your data with enterprise-grade security, and now with even more granular control over data retention.
Learn more about our data security practices
Transcribe public audio URLs directly in the Playground

Our Playground just got a little more powerful: you can now transcribe audio directly from public URLs.
No more downloading files just to upload them again. Paste a public audio URL, and you're good to go.
GPT-5.1 & 5.2 now supported on LLM Gateway
OpenAI’s newest GPT-5.1 and GPT-5.2 models are live in the LLM Gateway.
These models come with sharp reasoning and instruction-following abilties. GPT-5.2 in particular excels at multi-step legal, finance and medical tasks where earlier models stalled, letting you ship production features that previously needed heavy post-processing or human review.
For more information, check out our docs here.
Keyterm Prompting Now Available for Universal-Streaming Multilingual
Keyterm prompting is now in production for multilingual streaming, giving developers the ability to improve accuracy for target words in real-time transcription. This enhancement is live for all users across the Universal-Streaming platform.
Keyterm prompting enables developers to prioritize specific terminology in transcription results, which is particularly valuable for conversational AI and voice agent use cases where domain-specific accuracy matters. By specifying keywords relevant to your application, you'll see improved recognition of critical terms that might otherwise be misheard or misinterpreted.
To use Keyterm prompting with Universal-Streaming Multilingual, include a list of keyterms in your connection parameters:
CONNECTION_PARAMS = {
"sample_rate": 16000,
"speech_model":"universal-streaming-multilingual",
"keyterms_prompt": json.dumps(["Keanu Reeves", "AssemblyAI", "Universal-2"])
}Expanding Keyterm prompting to Universal-Multilingual Streaming reinforces our commitment to giving developers precise control over recognition results for specialized vocabularies.
Learn more in our docs.
Hallucination Rate Reduced for Multilingual Streaming
We've improved hallucination detection and reduction across Universal-Multilingual Streaming transcription, resulting in fewer false outputs while maintaining minimal latency impact. This improvement is live for all users.
Lower hallucination rates mean more reliable transcription results out of the box, especially in edge cases where model confidence is uncertain. You'll see more accurate, trustworthy outputs without needing to modify existing implementations
This improvement is automatic and applies to all new Streaming sessions.
Transcription Access Now Scoped to Project Level for Uploaded Files
We've tightened security controls on pre-recorded file transcription by scoping access to uploaded files within the same project that uploaded them.
Previously, API tokens could transcribe files across projects. Now, tokens must belong to the same project that originally uploaded the file to transcribe it. This strengthens your security posture and prevents unintended cross-project access to sensitive audio files.
This security enhancement reflects our commitment to protecting your data and giving you granular control over who can access transcriptions within your organization.
Gemini 3 Pro now available in LLM Gateway
Google's latest Gemini 3 Pro model is now available through AssemblyAI's LLM Gateway, giving you access to one of the most advanced multimodal models with the same unified API you use for all your other providers.
With AssemblyAI's LLM Gateway, you can now test Gemini 3 Pro against models from OpenAI, Anthropic, Google, and others without changing your integration—just swap the model parameter and compare responses, latency, and cost across providers in real-time.
Available now for all LLM Gateway Users
To get started, simply update the "model" parameter in your LLM Gateway request to "gemini-3-pro-preview":
import requests
headers = {
"authorization": "<YOUR_API_KEY>"
}
response = requests.post(
"https://llm-gateway.assemblyai.com/v1/chat/completions",
headers = headers,
json = {
"model": "gemini-3-pro-preview",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 1000
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])
AssemblyAI's LLM Gateway gives you a single API to access 15+ LLMs from every major provider, with built-in fallbacks, load balancing, and cost tracking. Compare models, optimize for performance or price, and switch providers instantly, all without rewriting code.
View our docs and try Gemini 3 Pro in LLM Gateway.
AssemblyAI Streaming Updates: Multi-Region Infrastructure, Session Controls, and Self-Hosted License Management
Self-Hosted Streaming v0.20: License Management Now Available
Self-Hosted Streaming v0.20 now includes built-in license generation and validation, giving enterprises complete control over deployment security and usage tracking. Organizations can manage their speech AI infrastructure with the same compliance controls they expect from enterprise software.
The new licensing system enables IT teams to track deployment usage, enforce security policies, and maintain audit trails—critical for regulated industries like healthcare and financial services. License validation happens at startup and can be configured for periodic checks to ensure continuous compliance.
Available now for all AssemblyAI Self-Hosted Streaming customers.
Contact your account team to generate licenses for your deployments.
Multi-Region Streaming: US-East-1 Now Live
AssemblyAI's Streaming API is now available in us-east-1, providing regional redundancy and expanded compute capacity for production workloads. The infrastructure update reduces single-region dependency and prepares the platform for upcoming EU deployment.
Multi-region availability means contact centers and live captioning applications can maintain service continuity during regional incidents while accessing additional compute capacity for peak usage periods. The architecture changes also enable faster rollout of new regions based on customer demand.
Available immediately across all AssemblyAI's Streaming API plans. Traffic is automatically routed to the optimal region based on latency and capacity.
Try AssemblyAI’s Streaming API now or view regional availability.
Inactivity Timeout Controls for Streaming Sessions
AssemlyAI’s Streaming API now supports configurable inactivity_timeout parameters, giving developers precise control over session duration management. Applications can extend timeout periods for long-running sessions or reduce them to optimize connection costs.
The feature enables voice agents and live transcription systems to automatically close idle connections without manual intervention. Contact centers can reduce costs on silent periods while ensuring active calls stay connected. Voice agent developers can keep sessions open longer during natural conversation pauses without manual keep-alive logic.
Available now for all AssemblyAI Streaming customers. Set the inactivity_timeout parameter (in seconds) when initializing your connection.
Implementation:
- Set
inactivity_timeoutin your connection parameters - Values range from 5 to 3600 seconds
- Default timeout remains 30 seconds if not specified
- Available across all pricing tiers
View our documentation to learn more.
Streaming Model Update: Enhanced Performance & New Capabilities
We've released a new version of our English Streaming model with significant improvements across the board.
Performance gains:
- 88% better accuracy on short utterances and repeating commands/numbers
- 12% faster emission latency
- 7% faster time to complete transcript
- 4% better accuracy on prompted keyterms
- 3% better accuracy on accented speech
New features:
- Language detection for utterances (Multilingual model only) – Get language output for each utterance to feed downstream processes like LLMs
- Dynamic keyterms prompting – Update your keyterms list mid-stream to improve accuracy on context you discover during the conversation
LeMUR Deprecation
LeMUR will be deprecated on March 31, 2026 and will no longer work after this date.
Users will need to migrate to LLM Gateway by that date for continued access to language model capabilities and benefit from an expanded model selection as well as better performance.
For more information, check out our documentation on LLM Gateway and our LeMUR to LLM Gateway migration guide.
Universal Multilingual Streaming
We've launched Universal Multilingual Streaming, enabling low-latency transcription across multiple languages without compromising accuracy.
Key Features
- 6 languages: English, Spanish, French, German, Italian, Portuguese
- Industry-leading accuracy: 11.77% average WER across real-world audio
- $0.15/hour flat: No language surcharges
- Production-ready: Punctuation, capitalization, and intelligent endpointing included
Use Cases
- Voice agents serving international customers
- Multi-language contact centers
- Cross-border meeting transcription
- Healthcare systems serving diverse communities
No complex language detection pipelines or multiple vendor integrations required. See our documentation for implementation details.
Deprecation of V2 Streaming
Our legacy streaming endpoint (/v2/realtime/ws) will be deprecated on January 31, 2026, and will no longer work after this date.
Users will need to migrate to Universal-Streaming before the deadline to avoid service interruption.
Why upgrade?
- Higher accuracy
- Lower latency
- Intelligent endpointing
- Multilingual support
- Lower pricing ($0.15/hour)
For more information, check out our documentation on Universal-Streaming and our V2 to V3 migration guide.
Claude 3.5 & 3.7 Sonnet Sunset
As previously announced, we will be sunsetting Claude 3.5 Sonnet and 3.7 Sonnet for LeMUR on October 29th. After this date, requests made using Claude 3.5 and 3.7 Sonnet will return errors.
If you are using this model, we recommend switching to Claude 4 Sonnet, which is more performant than Claude 3.5 and 3.7 Sonnet. You can switch models by setting the final_model parameter to anthropic/claude-sonnet-4-20250514
New Voice AI Tools and Model Updates
Introducing new tools and model updates to help you build, deploy, and scale Voice AI applications:
Speech Understanding: Advanced speaker identification, custom formatting rules, and translation let you transform raw transcripts into structured data instantly
LLM Gateway: One API for your entire voice-to-intelligence pipeline with integrated access to GPT, Claude, Gemini, and others.
Voice AI Guardrails: PII redaction in 50+ languages, profanity filtering, and content moderation.
Model Enhancements:
- Automatically code-switch between 99 languages, with 64% fewer speaker counting errors
- Up to 57% accuracy improvements on critical terms with 1,000-word context-aware prompting
Read more about these tools in our blog and check out our documentation for more information.
Speaker Diarization Update
We've shipped significant improvements to speaker count accuracy on Universal and Slam-1:
- 36% more accurate speaker detection for pre-recorded files
- Fewer false positives and missed speakers - the model now consistently identifies the correct number of participants
Slam-1 bugfixes
Fix released to address hallucinations occasionally produced in Slam-1 transcriptions.
Slam-1 Timestamps:
- Fixed issue where sentences separated by silence were sometimes incorrectly combined due to inaccurate timestamps.
- Fix released to reduce occasional timestamp inconsistencies.
Universal-Streaming Improvements
We've released updates to our Universal-Streaming model, bringing significant performance improvements across the board.
What's better:
- Overall accuracy: 3% improvement in general transcription accuracy
- Accented speech: 4% better recognition for speakers with accents
- Conversation Intelligence segments: 4% improvement in conversation intelligence use cases
- Proper nouns: 7% better at recognizing names, brands, and places
- Repeated words: 21% improvement when speakers repeat themselves
- Speed: 20ms faster response time for even lower latency
- Keyterms Prompting: Up to 66% better recognition of your custom terms
Keyterms Prompt for Universal (Beta) and PII Redaction Updates; bugfix
The keyterms_prompt parameter can now be used with Universal for pre-recorded audio transcription, ensuring accurate recognition of product names, people, and industry terms. This feature is in Beta and only available for English files. For more information, please refer to our documentation.
PII Audio Redaction is now available for files processed via the EU endpoint.
PII Redaction now supports additional languages: Afrikaans, Bengali, and Thai.
Fixed issue where occasionally Slam-1 incorrectly inserted new lines in transcripts.