What is the best speech-to-text API for deposition and legal transcription?

AssemblyAI's Universal-3 Pro Streaming is the leading speech-to-text API for deposition and legal transcription — ~150ms P50 median latency, native speaker diarization for counsel/witness/judge separation, 28% better consecutive number recognition for case numbers and citations, and dynamic keyterm prompting (up to 100 terms per session) for statutes, party names, and matter-specific vocabulary. Inline PII redaction masks SSNs, dates of birth, and addresses before transcripts reach your case management system. Universal-3 Pro Streaming runs at $0.45/hr with unlimited concurrency. For voice-driven legal intake (new-client calls, scheduling, basic Q&A), the Voice Agent API ($4.50/hr) bundles STT, LLM, and TTS for fully managed conversational agents.

How do I build an AI court reporter or deposition transcription tool?

Stream deposition or hearing audio into Universal-3 Pro Streaming via WebSocket. Enable speaker_labels=true to attribute each turn to a speaker (the model returns a `speaker_label` field per Turn event). Add case names, party names, exhibit numbers, and matter-specific terms as keyterms_prompt (up to 100 per session, 50 characters each) so the model recognizes them accurately. Enable redact_pii=true with redact_pii_policies for the categories you need to mask (SSNs, dates of birth, account numbers). Pipe finalized turns with timestamps into your case management or discovery workflow.

Can speech-to-text identify multiple speakers in a deposition or hearing?

Yes. Universal-3 Pro Streaming includes native speaker diarization — enable speaker_labels=true and each finalized Turn event includes a `speaker_label` field (e.g. "A", "B", "C") attributing the turn to a distinct voice. Speaker accuracy improves over the course of a session as the model accumulates embedding context, which is ideal for long depositions and multi-day hearings. For multichannel captures where each speaker has their own microphone, run separate WebSocket connections per channel for cleaner separation than diarization alone.

How does PII and privileged-information redaction work for legal audio?

AssemblyAI's streaming PII redaction masks personally identifiable information on finalized turns before they leave the API. Set redact_pii=true on the connection and pass redact_pii_policies to scope categories — SSNs, dates of birth, phone numbers, email addresses, account numbers, credit card numbers, addresses, person names, and more. Use redact_pii_sub="entity_name" to replace with category tags like [SSN] or [DATE_OF_BIRTH], or redact_pii_sub="hash" for hash masking. For workflows that need redacted audio files as well, use AssemblyAI's pre-recorded PII redaction with redact_pii_audio=true.

What integrations work with case management software like Clio, MyCase, or LexisNexis?

AssemblyAI's Universal-3 Pro Streaming integrates with any case management or discovery platform that exposes a webhook or API — Clio, MyCase, Practice Panther, LexisNexis, Westlaw, Relativity, Everlaw, and custom builds. Pipe finalized transcripts (with speaker_label and word-level timestamps) into your platform via REST API or webhook. For voice-driven intake and scheduling, the Voice Agent API exposes tool calls — register lookup_client, create_matter, or schedule_consultation as tools and the agent invokes them mid-call to push data directly to your case management backend.

Is AssemblyAI's transcription accurate enough for court admissibility?

AssemblyAI's Universal-3 Pro Streaming is the most accurate real-time speech-to-text model on the market, with 28% better consecutive number recognition than competing providers and 4.58% mean word error rate on multilingual benchmarks. Most jurisdictions still require a certified court reporter to attest to the official transcript — AI transcription typically supplements (not replaces) certified records. The most common pattern: use AssemblyAI for real-time draft transcripts during proceedings, then have a certified court reporter finalize and attest the official version. AssemblyAI is SOC 2 Type 2 certified, with PCI DSS v4.0 and ISO 27001:2022 covering the underlying infrastructure for regulated legal workloads.

Solutions

Voice agents for legal transcription and proceedings

Build voice-powered legal tools that transcribe depositions, hearings, and client consultations with speaker-labeled, timestamped, court-grade accuracy — and redact privileged information automatically.

Get started free Talk to sales

Deposition transcript

Live

Williams v. Meridian Corp. · Case No. 2026-CV-04471

Counsel (Ms. Reeves)

10:14:32

"Can you describe what happened on the morning of March 12th?"

Witness (Mr. Dalton)

10:14:41

"I arrived at the facility around 7 AM. I spoke with [REDACTED] about the shipment."

Counsel (Ms. Reeves)

10:14:58

"And who authorized the change to the delivery schedule?"

Court reporter

10:15:03

[Speaker 4 identified · diarization active]

The problem

Legal transcription is expensive, slow, and inconsistent

Court reporting runs $4–6 per page with 24–48 hour turnaround, and paralegal review of depositions for privilege and PII is brutally manual. Voice agents could automate intake, scheduling, and transcription — but legal demands word-level accuracy, speaker attribution, timestamp precision, and privilege detection that consumer ASR can't deliver. AssemblyAI is purpose-built for the audio reality of depositions, hearings, and consultations.

Built for the audio realities of legal work

Latency ~150ms

Median streaming latency for real-time courtroom and deposition transcription.

Entity accuracy 28%

Better consecutive number recognition for case numbers, citations, and exhibit IDs.

Keyterms 100

Domain-specific legal terms boosted per session — statutes, case law, party names.

Compliance SOC 2

Type 2 certified, plus PCI DSS v4.0 and ISO 27001:2022 for regulated legal workloads.

Two ways to build

Pick the API that fits your legal stack

Ship a voice-driven intake agent in an afternoon, or drop court-grade streaming STT into the case management software you already run.

Recommended

Voice Agent API

Our proprietary voice stack via one WebSocket. Run a voice-driven legal intake or scheduling agent that captures caller info, books consultations, and pushes to your case management system — zero infra to manage.

Best for

Legal intake calls, scheduling, and client consultations
Tool calls for Clio, MyCase, Practice Panther write-back
Built-in keyterm prompting for matter vocabulary
Claude Code compatible — paste the docs and build anything

$4.50/hr — speech, LLM, and voice all included

Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The court-grade transcription layer for your discovery and case management workflow. Speaker labels, word-level timestamps, and inline PII redaction baked in.

Best for

Deposition, hearing, and consultation transcription
Native speaker diarization for counsel, witness, judge
28% better consecutive number recognition for case numbers
Inline PII redaction for SSNs, DOBs, and addresses
SOC 2 Type 2, PCI DSS v4.0, ISO 27001:2022 certified

$0.45/hr — transcription only, unlimited streams

View integration docs

No concurrency caps · Autoscaling included

One pipeline turns proceedings into structured legal records

Capture deposition or hearing audio

Stream audio from court-reporter mics, video-conferencing platforms, or recorded depositions into the API. Multi-party rooms work natively.

Transcribe with speaker labels and timestamps

Universal-3 Pro Streaming attributes each turn to a speaker label (counsel, witness, judge) with word-level timestamps — exactly the structure paralegals need for review.

Redact privileged info and PII inline

Streaming PII redaction masks SSNs, dates of birth, account numbers, and addresses before finalized turns ever leave the API. Configurable policies per matter.

Export to case management or discovery

Push structured transcripts (speakers, timestamps, redaction markers) into Clio, MyCase, Practice Panther, LexisNexis, Westlaw, or your custom discovery workflow.

gavel

Legal transcription pipeline

Capture deposition or hearing audio

↓

Transcribe with speaker labels + timestamps

↓

Redact privileged info + PII inline

↓

Export to case management or discovery

Quickstart

Build a voice-powered legal tool in minutes

Voice Agent API — legal intake agent with case-management write-back

# Voice Agent API: legal intake voice agent
import asyncio, json, websockets

API_KEY = "YOUR_API_KEY"

async def run_intake_agent():
    async with websockets.connect(
        "wss://agents.assemblyai.com/v1/ws",
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": (
                    "You are a legal intake assistant for Reeves & Partners. "
                    "Capture caller name, case type, key dates, and a one-line "
                    "summary. Never offer legal advice. Confirm captured fields "
                    "and route to the attorney on call via schedule_consultation."
                ),
                "greeting": "Reeves & Partners, how can I help you today?",
                "input": {"keyterms": ["deposition", "subpoena", "motion to compel", "discovery"]},
                "output": {"voice": "ivy"},
                "tools": [{
                    "type": "function",
                    "name": "schedule_consultation",
                    "description": "Book a consultation with the attorney on call.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "client_name": {"type": "string"},
                            "case_type": {"type": "string"},
                            "summary": {"type": "string"},
                        },
                        "required": ["client_name", "case_type"],
                    },
                }],
            },
        }))
        async for msg in ws:
            handle(json.loads(msg))  # transcript.user, reply.audio, tool.call, ...

Universal-3 Pro Streaming — live deposition transcript with PII redaction

# Universal-3 Pro Streaming: live deposition transcript
import asyncio, json, websockets
from urllib.parse import urlencode

API_KEY = "YOUR_API_KEY"

params = urlencode({
    "sample_rate": 16000,
    "speech_model": "u3-rt-pro",
    "keyterms_prompt": json.dumps([
        "Williams v. Meridian Corp", "Case No. 2026-CV-04471",
        "Exhibit A", "deposition", "subpoena duces tecum",
        "motion to compel",
    ]),
    "format_turns": "true",
    "speaker_labels": "true",                  # counsel vs. witness vs. judge
    "redact_pii": "true",                      # mask privileged PII inline
    "redact_pii_policies": json.dumps([
        "us_social_security_number", "date_of_birth",
        "phone_number", "email_address", "person_name",
        "credit_card_number", "location",
    ]),
    "redact_pii_sub": "entity_name",           # e.g. [PERSON_NAME]
})

async def transcribe_deposition(audio_iter, export_to_case_mgmt):
    url = f"wss://streaming.assemblyai.com/v3/ws?{params}"
    async with websockets.connect(
        url, additional_headers={"Authorization": API_KEY},
    ) as ws:
        async def send_audio():
            async for chunk in audio_iter:
                await ws.send(chunk)
        asyncio.create_task(send_audio())
        async for raw in ws:
            evt = json.loads(raw)
            if evt.get("type") == "Turn" and evt.get("end_of_turn"):
                # finalized turn with speaker_label + PII-redacted transcript
                export_to_case_mgmt({
                    "speaker": evt.get("speaker_label"),
                    "transcript": evt["transcript"],
                    "words": evt.get("words", []),  # word-level timestamps
                })

Try in Playground View full docs

Speaker diarization for multi-party proceedings

Universal-3 Pro Streaming attributes each turn to a speaker label out of the box — counsel, witness, judge, court reporter — so depositions and hearings transcribe with the structure paralegals need without manual cleanup.

Inline PII redaction for privileged content

Mask SSNs, dates of birth, account numbers, addresses, and 10+ other PII categories on finalized turns. Configurable per matter and per redaction policy — no privileged information ever reaches your case management system unredacted.

Entity accuracy for case numbers and citations

Universal-3 Pro Streaming delivers 28% better consecutive number recognition than competing real-time STT — critical for capturing case numbers (2026-CV-04471), exhibit IDs, statute references, and citations cleanly the first time.

Voice AI builders at scale on AssemblyAI

80% increase in customer satisfaction

Calabrio replaced its legacy on-premise transcription solution with AssemblyAI's API, powering its enterprise workforce and conversation intelligence platform across multilingual contact center recordings.

Calabrio

200M calls and texts processed

Aloware's AloAi Voice Analytics platform processes 200 million calls and texts on AssemblyAI's API — converting half its client base to AI-powered packages and lifting lead-to-close rate by 27%.

Aloware

: AssemblyAI's Universal-3 Pro Streaming is the leading speech-to-text API for deposition and legal transcription — ~150ms P50 median latency, native speaker diarization for counsel/witness/judge separation, 28% better consecutive number recognition for case numbers and citations, and dynamic keyterm prompting (up to 100 terms per session) for statutes, party names, and matter-specific vocabulary. Inline PII redaction masks SSNs, dates of birth, and addresses before transcripts reach your case management system. Universal-3 Pro Streaming runs at $0.45/hr with unlimited concurrency. For voice-driven legal intake (new-client calls, scheduling, basic Q&A), the Voice Agent API ($4.50/hr) bundles STT, LLM, and TTS for fully managed conversational agents.
: Stream deposition or hearing audio into Universal-3 Pro Streaming via WebSocket. Enable speaker_labels=true to attribute each turn to a speaker (the model returns a `speaker_label` field per Turn event). Add case names, party names, exhibit numbers, and matter-specific terms as keyterms_prompt (up to 100 per session, 50 characters each) so the model recognizes them accurately. Enable redact_pii=true with redact_pii_policies for the categories you need to mask (SSNs, dates of birth, account numbers). Pipe finalized turns with timestamps into your case management or discovery workflow.
: Yes. Universal-3 Pro Streaming includes native speaker diarization — enable speaker_labels=true and each finalized Turn event includes a `speaker_label` field (e.g. "A", "B", "C") attributing the turn to a distinct voice. Speaker accuracy improves over the course of a session as the model accumulates embedding context, which is ideal for long depositions and multi-day hearings. For multichannel captures where each speaker has their own microphone, run separate WebSocket connections per channel for cleaner separation than diarization alone.
: AssemblyAI's streaming PII redaction masks personally identifiable information on finalized turns before they leave the API. Set redact_pii=true on the connection and pass redact_pii_policies to scope categories — SSNs, dates of birth, phone numbers, email addresses, account numbers, credit card numbers, addresses, person names, and more. Use redact_pii_sub="entity_name" to replace with category tags like [SSN] or [DATE_OF_BIRTH], or redact_pii_sub="hash" for hash masking. For workflows that need redacted audio files as well, use AssemblyAI's pre-recorded PII redaction with redact_pii_audio=true.
: AssemblyAI's Universal-3 Pro Streaming integrates with any case management or discovery platform that exposes a webhook or API — Clio, MyCase, Practice Panther, LexisNexis, Westlaw, Relativity, Everlaw, and custom builds. Pipe finalized transcripts (with speaker_label and word-level timestamps) into your platform via REST API or webhook. For voice-driven intake and scheduling, the Voice Agent API exposes tool calls — register lookup_client, create_matter, or schedule_consultation as tools and the agent invokes them mid-call to push data directly to your case management backend.
: AssemblyAI's Universal-3 Pro Streaming is the most accurate real-time speech-to-text model on the market, with 28% better consecutive number recognition than competing providers and 4.58% mean word error rate on multilingual benchmarks. Most jurisdictions still require a certified court reporter to attest to the official transcript — AI transcription typically supplements (not replaces) certified records. The most common pattern: use AssemblyAI for real-time draft transcripts during proceedings, then have a certified court reporter finalize and attest the official version. AssemblyAI is SOC 2 Type 2 certified, with PCI DSS v4.0 and ISO 27001:2022 covering the underlying infrastructure for regulated legal workloads.

Build voice-powered legal tools today

Free tier, no credit card. From client intake to deposition transcription with PII redaction in an afternoon.

Get started free