May 1, 2026

AssemblyAI April 2026 recap

It’s been a packed couple of months. From our most accurate streaming model yet to a full voice agent API, here’s a rundown of everything we launched—and why it matters for what you’re building.

Martin Schweiger

Senior Technical Product Marketing Manager

Reviewed by

Table of contents

[Visible on live site]

It’s been a packed couple of months. From our most accurate streaming model yet to a full voice agent API, here’s a rundown of everything we launched and why it matters for what you’re building.

Universal-3 Pro Streaming

This is the big one. Universal-3 Pro Streaming is our most accurate real-time transcription model, purpose-built for voice agents and live audio workflows.

The numbers speak for themselves: 8.14% word error rate across English—the lowest of any streaming provider we’ve benchmarked against, including Deepgram Nova-3 (11.06%), OpenAI GPT-4o Transcribe (9.90%), and Microsoft Azure (9.11%).

But raw accuracy is only part of the story. The real differentiator is entity recognition. Emails, phone numbers, medical terms, URLs—the structured data that voice agents actually act on—are where most models fall apart. Universal-3 Pro Streaming delivers the lowest missed entity rates across every category we tested.

Key capabilities include:

Real-time prompting—guide transcription behavior with natural language instructions
Dynamic key term prompting—boost up to 1,000 domain-specific terms, updated turn-by-turn mid-conversation
Real-time speaker diarization—identify and separate speakers mid-conversation
Sub-200ms end-to-end latency with immutable transcripts
Six languages at launch: English, Spanish, French, German, Portuguese, and Italian

‍

Native integrations with LiveKit, Pipecat, Twilio, and Daily mean you can go from sign-up to a production voice agent in under 15 minutes. Pricing starts at $0.45/hr.

Medical Mode

If you’re building anything in healthcare—ambient AI scribes, clinical documentation, telehealth tools—Medical Mode is a game changer.

It’s an add-on that reduces missed medical entities by over 20% compared to Universal-3 Pro alone. Drug names, dosages, diagnoses, anatomical terms—the terminology that directly affects patient outcomes gets transcribed correctly the first time.

Medical Mode works across both real-time streaming and pre-recorded (async) workflows, and it’s available on Universal-3 Pro Streaming. It beats every dedicated medical transcription competitor we benchmarked, including Deepgram, Speechmatics Enhanced Medical, AWS Transcribe Medical, and Google Medical Conversation.

Pricing is $0.15/hr as an add-on. AssemblyAI offers a Business Associate Addendum (BAA) for covered entities handling PHI, and data training is opted out by default.

Voice Agent API

Building a voice agent used to mean stitching together three separate providers—STT, LLM, and TTS—across three SDKs, three invoices, and three debugging surfaces. The Voice Agent API replaces all of that with a single WebSocket.

Stream audio in, get audio back. We handle speech understanding, LLM reasoning, voice generation, turn detection, and interruption handling. You write the system prompt and focus on your product.

What makes it different:

Speech-aware turn detection—knows when you’re pausing to think vs. done talking. No more getting cut off mid-sentence.
Tool calling—register any function with JSON Schema. The agent calls it when appropriate—look up an account, check an order, trigger a workflow.
Live configuration updates—change the system prompt, voice, tools, and VAD settings mid-conversation. No reconnection needed.
Session resumption—reconnect within 30 seconds if the WebSocket drops. Context preserved.
~1 second end-to-end latency at $4.50/hr flat—covering STT, LLM, and TTS. That’s roughly 4x cheaper than OpenAI’s Realtime API.

‍

It’s a standard JSON API—no SDK required. Most developers get a working agent running the same afternoon.

Docs, MCP Server, and Claude Code Skill

We shipped a set of developer experience improvements that make it faster to build with AssemblyAI from AI-native workflows:

Refreshed documentation—restructured for clarity, with better code examples and integration guides across all products.
MCP Server—a Model Context Protocol server that lets AI tools like Claude Code and other MCP-compatible agents interact directly with the AssemblyAI API. Transcribe audio, search transcripts, and run LLM Gateway queries without leaving your coding environment.
Claude Code Skill—copy our docs into Claude Code and build working integrations from a natural language description. The Voice Agent API was specifically designed to work well with this workflow.

The goal: if you can describe what you want to build, you should be able to ship it in an afternoon.

New models on the LLM Gateway

LLM Gateway is an OpenAI-compatible API that lets you apply 20+ LLMs to your transcripts through a single endpoint. This month we added four new models:

Claude Opus 4.7 (Anthropic)
Kimi 2.5 (Moonshot AI)
Qwen 3 (Alibaba)
GPT 5.5 (OpenAI)

Swap models by changing the model field in your API call—no other code changes required. One API key, unified billing, and no provider prefixes in the model ID.

The Gateway already supported Claude, GPT, and Gemini families. With Kimi and Qwen now in the mix, you have even more flexibility to pick the right model for your use case—whether that’s cost optimization, multilingual support, or reasoning quality.

Universal-2 improvements: Hebrew and Swedish

Universal-2 now supports Hebrew and Swedish, bringing the total language count higher for teams building multilingual products on the async transcription API.

These aren’t beta-quality additions—they’re production-ready, with the same accuracy and feature support (speaker diarization, entity detection, sentiment analysis) available for other Universal-2 languages.

PII Redaction updates

We made improvements to our PII Redaction capabilities for both text and audio. PII Text Redaction identifies and removes personally identifiable information—phone numbers, social security numbers, names, addresses—from the transcription text before it’s returned to you. PII Audio Redaction does the same directly in the audio file itself.

The updates improve detection coverage and accuracy, especially for edge cases around formatted entities like phone numbers with varied separators and multi-part names. If you’re building in healthcare, finance, or any regulated industry, these improvements mean fewer manual review passes.

Start building

That’s the full list. Whether you’re adding real-time transcription to a voice agent, building clinical documentation tools, or just want to run your call recordings through an LLM—there’s something here for you.

Watch the full recap video to see each feature in action, or sign up for a free API key and start building today. You get $50 in free credits—no credit card required.

AssemblyAI April 2026 recap

Universal-3 Pro Streaming

Medical Mode

Voice Agent API

Docs, MCP Server, and Claude Code Skill

New models on the LLM Gateway

Universal-2 improvements: Hebrew and Swedish

PII Redaction updates

Start building

Hyperparameters of Neural Networks

AssemblyAI ping pong tournament 2025: When NYC's tech community came to compete

How does real-time agent assist work? An implementation guide

Announcing the AssemblyAI Node SDK 2.0

AssemblyAI April 2026 recap

Universal-3 Pro Streaming

Medical Mode

Voice Agent API

Docs, MCP Server, and Claude Code Skill

New models on the LLM Gateway

Universal-2 improvements: Hebrew and Swedish

PII Redaction updates

Start building

Related posts

Hyperparameters of Neural Networks

AssemblyAI ping pong tournament 2025: When NYC's tech community came to compete

How does real-time agent assist work? An implementation guide

Announcing the AssemblyAI Node SDK 2.0