For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundChangelogSign In
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
  • Getting started
    • Transcribe a pre-recorded audio file
    • Model selection
    • View model benchmarks
    • Evaluate model accuracy
    • Cloud endpoints & data residency
    • Manage concurrent requests
    • Webhooks
  • Models
    • Medical Mode
  • Features
    • Boost specific terms
    • Label speakers
    • Transcribe multiple audio channels
    • Transcribe audio with mixed languages
    • Correct spelling of terms
    • Include filler words
    • Search for words in transcript
    • Set the start and end of the transcript
  • Guides
LogoLogo
PlaygroundChangelogSign In
On this page
  • Quickstart
  • Per-channel diarization
Features

Multichannel Transcription

Was this page helpful?
Previous

Code Switching

Next
Built with
Supported Languages, Regions, and Models

Multichannel transcription is supported for all languages, regions, and models.

If you have a multichannel audio file with multiple speakers, you can transcribe each of them separately.

The response includes an audio_channels property with the number of different channels, and an additional utterances property, containing a list of turn-by-turn utterances.

Each utterance contains channel information, starting at 1.

Additionally, each word in the words array contains the channel identifier.

Quickstart

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(
9 speech_models=["universal-3-pro", "universal-2"],
10 language_detection=True,
11 multichannel=True
12)
13
14transcript = aai.Transcriber(config=config).transcribe(audio_file)
15
16if transcript.status == "error":
17 raise RuntimeError(f"Transcription failed: {transcript.error}")
18
19for utterance in transcript.utterances:
20 print(f"Channel {utterance.speaker}: {utterance.text}")

Multichannel audio increases the transcription time by approximately 40%.

Per-channel diarization

If you have a multichannel audio file where individual channels may contain multiple speakers, you can combine multichannel and speaker_labels to perform diarization within each channel.

When using multichannel with speaker_labels, the speaker_options parameters (min_speakers_expected and max_speakers_expected) are applied per channel, not globally across the entire file. For example, setting min_speakers_expected: 5 and max_speakers_expected: 7 on a 5-channel file means the model will find 5–7 speakers on each channel, resulting in 25–35 total speakers. Adjust your speaker options accordingly when using multichannel transcription.

When both parameters are enabled:

  • Channels are labeled numerically (1, 2, 3, etc.)
  • Speakers within each channel are labeled alphabetically (A, B, C, etc.)
  • The combined speaker label format is {channel}{speaker} (e.g., “1A”, “1B”, “2A”)

For example, if channel 1 has two speakers and channel 2 has one speaker, the labels would be:

  • First speaker on channel 1: 1A
  • Second speaker on channel 1: 1B
  • First speaker on channel 2: 2A
1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(
9 speech_models=["universal-3-pro", "universal-2"],
10 language_detection=True,
11 multichannel=True,
12 speaker_labels=True
13)
14
15transcript = aai.Transcriber(config=config).transcribe(audio_file)
16
17if transcript.status == "error":
18 raise RuntimeError(f"Transcription failed: {transcript.error}")
19
20for utterance in transcript.utterances:
21 print(f"Speaker {utterance.speaker}: {utterance.text}")