January 6, 2026

Real-time transcription that code-switches for multilingual speakers

Learn how Universal-Streaming enables real-time transcription for multilingual speakers who naturally code-switch mid-sentence, handling six languages in a single forward pass. Built for hybrid-language conversations like bilingual customer calls and international meetings, it delivers low-latency, highly accurate transcripts without added complexity.

Meredith Rauch

Growth

multilingual

Speech-to-Text

Reviewed by

Table of contents

[Visible on live site]

If you've ever dictated a text message and had to pause, switch your keyboard language, and start over, you know the frustration. That's especially true if you're bilingual and code-switch naturally: mixing languages mid-sentence the way millions of people actually communicate.

AssemblyAI's Universal-Streaming model now handles this naturally. The model transcribes six languages, English, Spanish, French, German, Italian, and Portuguese, in real time, in a single forward pass, with no language switching required.

How code-switching actually works in bilingual conversation

Anyone who's grown up bilingual knows the frustration. You're speaking naturally by blending English and Spanish the way you always have and your phone transcribes half of it as gibberish.

Linguists call this code-switching, alternating between languages within a single conversation or sentence, and it's how multilingual brains naturally work. In Miami, business calls flow between English and Spanish. In Montreal, meetings shift between French and English. A sentence might start in one language, borrow a phrase from another, and switch back all in the space of a few seconds.

This isn't a quirk. It's natural communication. But until now, voice-to-text systems couldn't handle it.

Real-time code-switching without delays

Universal-Streaming processes all six languages simultaneously in a unified architecture. There's no language detection gateway, no routing delays, and no need to tell the model what language you're about to speak.

You can say: "I am traveling to Arizona and I want to comprar unos tiquets de avion" and the model transcribes it accurately, in real time, as you speak.

The transcription appears instantly. No lag. No gibberish. No need to switch modes or restart.

How to try the model

You can test Universal-Streaming in the AssemblyAI Playground. Select "streaming" mode, choose "multi" for multilingual support, and start speaking. The model will transcribe your code-switching conversation in real time.

For developers building audio applications, the model is available through a simple API. Just set speech_model to universal-streaming-multilingual and open a WebSocket connection:

BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
CONNECTION_PARAMS = {   
	"sample_rate": RATE,
    "format_turns": True,   
    "speech_model": 
    "universal-streaming-multilingual",
}

Built for production Voice AI

Your model should deliver more than just multilingual support. Universal-Streaming transcripts includes proper punctuation, capitalization, and intelligent endpointing, so "I'm going to la tienda" appears correctly formatted, not as a wall of lowercase text.

The model maintains industry-leading accuracy with an average Word Error Rate of 11.77% across all six languages compared to Deepgram Nova-3's 12.76% and at a fraction of competitor pricing.

All languages are priced equally at $0.15 per hour, with no premium charges for non-English transcription.

Use cases beyond personal messaging

This technology powers voice agents serving global customers. For example, imagine a Spanish-speaking customer seamlessly switching to English for technical terms. It enables real-time agent assist tools for multilingual support teams, meeting assistants capturing discussions where teams naturally code-switch across international offices, and medical scribes documenting patient consultations in multiple languages.

If you're building an application where your users speak multiple languages, or switch between them naturally, Universal-Streaming handles it without added complexity.

Learn more about Universal-Streaming and code switching in our full documentation and sign up to get $50 in free API credits.

‍

Real-time transcription that code-switches for multilingual speakers

How code-switching actually works in bilingual conversation

Real-time code-switching without delays

How to try the model

Built for production Voice AI

Use cases beyond personal messaging

How do I transcribe audio in languages like Spanish, French, or German?

Build voice AI apps with LLM Gateway

Real-time vs batch transcription: What's the difference?

5 Google Cloud Speech-to-Text alternatives in 2026

RLHF vs RLAIF for language model alignment

Automate Meeting Notes with Python

Voice agent feature prioritization: What customers actually use (and what they don’t)

Node.js voice agent with AssemblyAI Universal-3 Pro Streaming

Real-time transcription that code-switches for multilingual speakers

How code-switching actually works in bilingual conversation

Real-time code-switching without delays

How to try the model

Built for production Voice AI

Use cases beyond personal messaging

Related posts

How do I transcribe audio in languages like Spanish, French, or German?

Build voice AI apps with LLM Gateway

Real-time vs batch transcription: What's the difference?

5 Google Cloud Speech-to-Text alternatives in 2026

RLHF vs RLAIF for language model alignment

Automate Meeting Notes with Python

Voice agent feature prioritization: What customers actually use (and what they don’t)

Node.js voice agent with AssemblyAI Universal-3 Pro Streaming