Changelog
Follow along to see weekly accuracy and product improvements.
Universal-2 Language Improvements: Hebrew & Swedish
Universal-2 transcription accuracy has improved significantly for Hebrew and Swedish, with word error rates reduced by 37% and 47% respectively. No changes to your integration required — the improvements are live automatically for all users.
AssemblyAI's Universal speech model delivers industry-leading accuracy across dozens of languages, with continuous improvements rolling out automatically. See all supported languages →
LLM Gateway: Automatic Model Fallbacks
LLM Gateway now supports automatic model fallbacks, giving your application resilience against model failures without changing your integration. If a model returns a server error, the Gateway will automatically retry with a fallback — or retry the same model after 500ms by default.
This is available now in Public Beta for all LLM Gateway users.
How to use it
Add a fallbacks array and optional fallback_config to your request. All fields from the original request are copied over to the fallback automatically — you only need to specify what you want to override.
Simple fallback — fall back to a different model, inheriting all original parameters:
{
"model": "kimi-k2.5",
"messages": [{ "role": "user", "content": "Summarize this meeting: ..." }],
"temperature": 0.2,
"fallbacks": [{ "model": "claude-sonnet-4-6" }]
}Advanced fallback — override specific parameters when falling back (e.g., different prompt or temperature for a different model's behavior):
{
"model": "kimi-k2.5",
"messages": [{ "role": "user", "content": "Summarize this meeting: ..." }],
"temperature": 0.2,
"fallbacks": [
{
"model": "claude-sonnet-4-6",
"messages": [
{ "role": "user", "content": "Summarize this meeting concisely, key info only: ..." }
],
"temperature": 0.3
}
]
}Fallback config options:
"fallback_config": {
"depth": 1, // max fallbacks to attempt (default: 1, max: 2)
"retry": true, // auto-retry on failure if no fallbacks set (default: true)
}By default, if no fallbacks are set, the API will automatically retry a failed request after 500ms. For more control, set fallback_config.retry to false and implement your own exponential backoff.
AssemblyAI's LLM Gateway gives you a single API to access leading models from every major provider — with built-in resilience, load balancing, and cost tracking. Check out our docs →
Introducing Medical Mode: Purpose-built accuracy for medical terminology
Medical Mode is a new add-on for AssemblyAI's Streaming Speech-to-Text that improves transcription accuracy for medical terminology — including medication names, procedures, conditions, and dosages. Available now on Universal-3 RT Pro, Universal Streaming English, and Universal Streaming Multilingual.
What it does
Medical Mode applies a correction pass optimized for medical entity recognition, targeting terms that general-purpose ASR frequently gets wrong. It works alongside the base model's noise handling, accent robustness, and latency characteristics — no tradeoffs.
Why it exists
General-purpose ASR can achieve strong overall accuracy on clinical audio while still consistently misrecognizing medical terminology. Because most healthcare AI pipelines feed transcripts directly into LLMs for structured output generation — SOAP notes, discharge summaries, referral letters — transcription errors on medical entities propagate rather than attenuate. Medical Mode intercepts those errors before they enter the pipeline.
How to enable it
Set the domain connection parameter to "medical-v1". No other changes to your existing pipeline are required.
Availability & pricing
- Available now on Universal-3, Universal-3 Pro Streaming, Universal Streaming English, and Universal Streaming Multilingual
- Supports English, Spanish, German, and French
- Billed as a separate add-on — see the pricing page for details
- HIPAA BAA, SOC 2 Type 2, ISO 27001:2022, PCI DSS v4.0 included
Resources
New LLM Gateway Models: Qwen3, Qwen3 Next, & Kimi K2.5
Three new models are now live in LLM Gateway for paid accounts: Qwen3 Next 80B A3B, Qwen3 32B from Alibaba Cloud, and Kimi K2.5 from Moonshot AI. These are competitive low-cost options, with Kimi K2.5 in particular offering strong performance at 1.2s latency per 10,000 tokens.
To use any of these models, update the model parameter in your LLM Gateway request:
// Qwen3 Next 80B A3B
"model": "qwen3-next-80b-a3b"
// Qwen3 32B
"model": "qwen3-32B"
// Kimi K2.5
"model": "kimi-k2.5"
All three are available now for paid accounts via LLM Gateway.
AssemblyAI's LLM Gateway gives you a single API to access 20+ models from Claude, GPT, Gemini, and more — swap models with a single parameter change, no integration work required. View all available models →
AssemblyAI Skill for AI Coding Agents
The AssemblyAI Skill is now available for AI coding agents — giving Claude Code, Cursor, Codex, and other vibe-coding tools accurate, up-to-date knowledge of AssemblyAI's APIs, SDKs, and integrations out of the box.
LLM training data goes stale fast. Without the skill, coding agents default to deprecated AssemblyAI patterns: the old LeMUR API instead of the LLM Gateway, wrong auth headers, discontinued SDK usage, and no awareness of newer features like Universal-3 Pro Streaming or the voice agent framework integrations. The AssemblyAI Skill corrects all of that — and covers the full current API surface, from pre-recorded transcription to real-time streaming to LLM Gateway workflows.
In evals, agents using the skill scored 17/17 on correctness across transcription, voice agent, and LLM Gateway scenarios. Without it: 7/17. The biggest gains are in voice agent integrations and LLM Gateway usage, where agents otherwise have no training data for framework-specific patterns.
How to use it
- Install via Claude Code:
cp -r assemblyai ~/.claude/skills/for personal use, orcp -r assemblyai .claude/skills/at the project level - For Codex, copy the folder and reference
assemblyai/SKILL.mdin yourAGENTS.md - Cursor and Windsurf: add the
assemblyai/directory as project-level documentation - Available now — free, open source, no API key required
AssemblyAI is the leading speech AI platform for developers — built for production with best-in-class accuracy, real-time streaming, and a full suite of audio intelligence features. The AssemblyAI Skill makes sure your coding agent builds with all of it correctly, every time.
PII Audio Redaction: Silence or Beep
You can now control how PII is replaced in redacted audio. By default, AssemblyAI substitutes PII with a beep tone — now you can switch that to silence instead.
To use silence instead of a beep, pass the redact_pii_audio_options parameter in your transcription request:
"redact_pii_audio_options": {
"override_audio_redaction_method": "silence"
}Omit the parameter entirely to keep the default beep behavior. Available now for all regions and all models on Pre-recorded transcription.
AssemblyAI's PII redaction automatically detects and removes sensitive information from both transcripts and audio — giving you compliant, production-ready output without extra processing steps. Learn more →
Universal-3-Pro Now Available for Streaming
Universal-3-Pro is now available for real-time streaming — bringing our most accurate speech model to live transcription for the first time. Developers building voice agents, live captioning tools, and real-time analytics pipelines can now combine Universal-3-Pro's state-of-the-art accuracy with the low latency of AssemblyAI's streaming API.
Universal-3-Pro streaming delivers three key capabilities that set it apart: best-in-class word error rates across streaming ASR benchmarks, real-time speaker labels to identify who is speaking at each turn, and superior entity detection for names, places, organizations, and specialized terminology — all in real time, not just in batch. And with built-in code switching, Universal-3-Pro handles multilingual audio natively, accurately transcribing speakers who move between languages mid-conversation.
Whether you're building voice agents that need to route conversations by speaker, transcription tools that must catch rare entities accurately, or global applications serving multilingual users, Universal-3-Pro for streaming gives you LLM-style accuracy at real-time speeds.
How to use it:
- Set
"speech_model": "u3-rt-pro"in your WebSocket connection parameters - Code switching is enabled automatically — no additional configuration needed
- Available now via the streaming endpoint for all users
- Read the full documentation
AssemblyAI's Universal-Streaming API is the fastest way to build real-time voice applications — and with Universal-3-Pro, it's now the most accurate too.
Share Your Playground Transcripts

The AssemblyAI Playground now has a share button. One click generates a shareable link to your transcript output that stays live for 90 days.
Whether you're dropping results into a Slack thread, looping in a teammate for a quick review, or showing a client what the output actually looks like before they integrate — you no longer need to copy-paste text or export anything. Just hit share and send the link.
The AssemblyAI Playground is the fastest way to test our transcription and audio intelligence models without writing a single line of code. Try different models, toggle features, and now share what you see instantly.
Claude Sonnet 4.6 now supported on LLM Gateway
Claude Sonnet 4.6 is now available through LLM Gateway. Sonnet 4.6 is our most capable Sonnet model yet with frontier performance across coding, agents, and professional work at scale. With this model, every line of code, every agent task, every spreadsheet can be powered by near-Opus intelligence at Sonnet pricing.hnm
To use it, update the model parameter to claude-sonnet-4-6 in your LLM Gateway requests.
For more information, check out our docs here.
Claude Opus 4.5 and 4.6 now supported on LLM Gateway
Claude's most capable models are now available through LLM Gateway. Opus 4.5 and Opus 4.6 bring significant improvements in reasoning, coding, and instruction-following.
To use it, update the model parameter to claude-opus-4-5-20251101 or claude-opus-4-6 in your LLM Gateway requests.
For more information, check out our docs here.
Universal-3-Pro: Our Promptable Speech-to-Text Model
We've released Universal-3-Pro, our most powerful Voice AI model yet—designed to give you LLM-style control over transcription output for the first time.
Unlike traditional ASR models that limit you to basic keyterm prompting or fixed output styles, Universal-3-Pro lets you progressively layer instructions to steer transcription behavior. Need verbatim output with filler words? Medical terminology with accurate dosages? Speaker labels by role? Code-switching between English and Spanish? You can design one robust prompt and apply it consistently across thousands of calls, getting workflow-ready outputs instead of brittle workarounds.
Out of the box, Universal-3-Pro outperforms all ASR models on accuracy, especially for entities and rare words. But the real power is in the prompting: natural language prompts up to 1,500 words for context and style, keyterms prompting for up to 1,000 specialized terms, built-in code switching across 6 languages, verbatim transcription controls for disfluencies and stutters, and audio tags for non-speech events like laughter, music, and beeps.
How to use it:
- Set
"speech_models": ["universal-3-pro", "universal"]with"language_detection": truefor automatic routing and 99-language coverage - Use
promptfor natural language instructions andkeyterms_promptfor boosting rare words (up to 1,000 terms, 6 words each) - Available now via the
/v2/transcriptendpoint - Read the full documentation
Universal-3-Pro represents a fundamental shift in what's possible with speech-to-text: true controllability that rivals human transcription quality, with the consistency and scale of an API.
Improved Speaker Diarization for Short Audio
Speaker diarization is now more accurate for audio files under 2 minutes, with a 19% improvement in speaker count prediction and 6% improvement in cpWER.
No changes required—this improvement is live for all users automatically.
Global Edge Routing & Data Zone Endpoints for Streaming Speech-to-Text
We've launched new streaming endpoints that give you control over latency optimization and data residency. Choose the endpoint that best fits your application's requirements—whether that's achieving the lowest possible latency or ensuring your audio data stays within a specific geographic region.
Edge Routing (streaming.edge.assemblyai.com) automatically routes requests to the nearest available region, minimizing latency for real-time transcription. With infrastructure in Oregon, Virginia, and Ireland, this endpoint delivers our best-in-class streaming performance regardless of where your users are located.
Data Zone Routing (streaming.us.assemblyai.com and streaming.eu.assemblyai.com) guarantees your data never leaves the specified region. This is designed for organizations with strict data residency and governance requirements—your audio and transcription data will remain entirely within the US or EU, respectively.
How to use it:
Simply update your WebSocket connection URL to your preferred endpoint:
wss://streaming.assemblyai.com/v3/ws(Global)wss://streaming.us.assemblyai.com/v3/ws(USA)wss://streaming.eu.assemblyai.com/v3/ws(EU)
The default endpoint (streaming.assemblyai.com) remains unchanged.
Multichannel Speaker Diarization
We've added support for multichannel speaker diarization with pre-recorded transcription, allowing you to identify individual speakers across multiple audio channels in a single API request.
This unlocks accurate transcription for complex audio scenarios like hybrid meetings, call center recordings with supervisor monitoring, or podcast recordings with multiple mics. Speaker labels are formatted as 1A, 1B, 2A, 2B, where the first digit indicates the channel and the letter identifies unique speakers within that channel. For example, in a meeting where Channel 1 captures an in-room conversation between two people and Channel 2 captures a remote participant, you'll get clear attribution for all three speakers even though Channel 1 contains multiple talkers.
How to use it:
- Set both
multichannel=trueandspeaker_labels=truein your transcription request—no other changes needed - Available now for all Universal customers across all plan tiers
- View documentation
Universal delivers industry-leading accuracy with advanced features like multichannel support and speaker diarization, giving you the precision and flexibility needed to build production-grade voice AI applications.
Gemini 3 Flash Preview now supported on LLM Gateway
Google's newest Gemini 3 Flash Preview model is live in the LLM Gateway.
This model delivers faster inference speeds with improved reasoning capabilities compared to previous Flash versions. Gemini 3 Flash Preview excels at high-throughput applications requiring quick response times—like real-time customer support agents, content moderation, and rapid document processing—while maintaining strong accuracy on complex queries that would have required slower, more expensive models.
For more information, check out our docs here.
Improved File Deletion for Enhanced Data Privacy
We've updated how uploaded audio files are deleted when you delete a transcript, giving you immediate control over your data.
Previously, when you made a DELETE request to remove a transcript, the associated uploaded file would remain in storage for up to 24 hours before automatic deletion. Now, uploaded files are immediately deleted alongside the transcript when you make a DELETE request, ensuring your data is removed from our systems right away.
This change applies specifically to files uploaded via the /upload endpoint. If you're reusing upload URLs across multiple transcription requests, note that deleting one transcript will now immediately invalidate that upload URL for any subsequent requests.
How it works:
- When you send a
DELETErequest to remove a transcript, any file uploaded via/uploadand associated with that transcript is now deleted immediately - This applies to all customers using the
/uploadendpoint across all plans - If you need to transcribe the same file multiple times, upload it separately for each request or retain the original file on your end
AssemblyAI's APIs are built with security and data privacy as core principles. Our speech-to-text and audio intelligence models process your data with enterprise-grade security, and now with even more granular control over data retention.
Learn more about our data security practices
Transcribe public audio URLs directly in the Playground

Our Playground just got a little more powerful: you can now transcribe audio directly from public URLs.
No more downloading files just to upload them again. Paste a public audio URL, and you're good to go.
GPT-5.1 & 5.2 now supported on LLM Gateway
OpenAI’s newest GPT-5.1 and GPT-5.2 models are live in the LLM Gateway.
These models come with sharp reasoning and instruction-following abilties. GPT-5.2 in particular excels at multi-step legal, finance and medical tasks where earlier models stalled, letting you ship production features that previously needed heavy post-processing or human review.
For more information, check out our docs here.
Keyterm Prompting Now Available for Universal-Streaming Multilingual
Keyterm prompting is now in production for multilingual streaming, giving developers the ability to improve accuracy for target words in real-time transcription. This enhancement is live for all users across the Universal-Streaming platform.
Keyterm prompting enables developers to prioritize specific terminology in transcription results, which is particularly valuable for conversational AI and voice agent use cases where domain-specific accuracy matters. By specifying keywords relevant to your application, you'll see improved recognition of critical terms that might otherwise be misheard or misinterpreted.
To use Keyterm prompting with Universal-Streaming Multilingual, include a list of keyterms in your connection parameters:
CONNECTION_PARAMS = {
"sample_rate": 16000,
"speech_model":"universal-streaming-multilingual",
"keyterms_prompt": json.dumps(["Keanu Reeves", "AssemblyAI", "Universal-2"])
}Expanding Keyterm prompting to Universal-Multilingual Streaming reinforces our commitment to giving developers precise control over recognition results for specialized vocabularies.
Learn more in our docs.
Hallucination Rate Reduced for Multilingual Streaming
We've improved hallucination detection and reduction across Universal-Multilingual Streaming transcription, resulting in fewer false outputs while maintaining minimal latency impact. This improvement is live for all users.
Lower hallucination rates mean more reliable transcription results out of the box, especially in edge cases where model confidence is uncertain. You'll see more accurate, trustworthy outputs without needing to modify existing implementations
This improvement is automatic and applies to all new Streaming sessions.