customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Build hands-free voice agents that let technicians pull up manuals, log work completed, order parts, and update work orders — all through voice while their hands stay on the job.
Work order
LiveVoice note captured
"Compressor seized — replacing scroll assembly. Need part #3BAK-0601, ordering now via voice."
Most field technicians spend 30–60 minutes per shift tapping through FSM screens — on ladders, in crawl spaces, on rooftops, with greasy or wet hands. Voice would fix it, but consumer ASR breaks on noisy job sites and butchers part numbers. The result: late work orders, missing parts, and revenue stuck in unbilled hours. AssemblyAI's purpose-built voice AI handles the real noise, the real vocabulary, and the real workflows of field service.
Median streaming latency for hands-free voice prompts and confirmations.
Better consecutive number recognition for part SKUs, model numbers, and asset IDs.
Total languages supported for multilingual field technician workforces.
Domain-specific terms per session — boost recognition of parts, tools, and procedures.
Two ways to build
Ship a working hands-free agent in an afternoon, or drop best-in-class streaming STT into the FSM platform you already run.
Our proprietary voice stack via one WebSocket. Run a hands-free agent that captures voice notes, confirms back via TTS, and writes updates to your FSM platform — zero infra to manage.
Best for
Free tier available · No credit card required
The live transcription layer for your FSM platform. Works natively with LiveKit, Pipecat, Vapi, and Twilio — entity-accurate, noise-robust, and multilingual out of the box.
Best for
No concurrency caps · Autoscaling included
Capture hands-free voice
Stream audio from a Bluetooth headset, phone speaker, or work-truck mic. No tapping, no swiping — technicians keep both hands on the job.
Transcribe with noise robustness
Universal-3 Pro handles loud HVAC units, generators, road traffic, and wind. Speaker labels separate technician from customer when on-site.
Extract structured work-order data
Finalized turns feed the LLM Gateway (25+ models across Claude, GPT, and Gemini) to extract part numbers, asset IDs, work status, and parts requests as structured fields.
Confirm and write back to FSM
Read captured fields back to the technician for confirmation, then push to ServiceTitan, FieldEdge, HousecallPro, Jobber, or your custom backend via tool calls or webhooks.
Field service pipeline
Capture hands-free voice notes
Transcribe — noise-robust + multilingual
Extract structured work-order fields
Confirm + push to FSM platform
Voice Agent API — hands-free agent with FSM write-back
# Voice Agent API: hands-free field service voice agent
import asyncio, json, websockets
API_KEY = "YOUR_API_KEY"
async def run_agent():
async with websockets.connect(
"wss://agents.assemblyai.com/v1/ws",
additional_headers={"Authorization": f"Bearer {API_KEY}"},
) as ws:
await ws.send(json.dumps({
"type": "session.update",
"session": {
"system_prompt": (
"You are a hands-free assistant for an HVAC field technician. "
"Capture part numbers, asset IDs, and work status. Always "
"confirm captured fields back to the tech before calling "
"update_work_order. Keep responses under 2 sentences."
),
"greeting": "Ready when you are — what's the update?",
"input": {"keyterms": ["Carrier 50XC", "Trane XR", "scroll assembly", "compressor"]},
"output": {"voice": "ivy"},
"tools": [{
"type": "function",
"name": "update_work_order",
"description": "Push captured fields to the FSM platform.",
"parameters": {
"type": "object",
"properties": {
"wo_id": {"type": "string"},
"part_number": {"type": "string"},
"status": {"type": "string"},
},
"required": ["wo_id", "status"],
},
}],
},
}))
async for msg in ws:
handle(json.loads(msg)) # transcript.user, reply.audio, tool.call, ...
Universal-3 Pro Streaming — voice notes to structured fields
# Universal-3 Pro Streaming: voice notes → structured work order
import asyncio, json, websockets
from urllib.parse import urlencode
API_KEY = "YOUR_API_KEY"
params = urlencode({
"sample_rate": 16000,
"speech_model": "u3-rt-pro",
"language_detection": "true", # tag each turn with detected language
"keyterms_prompt": json.dumps([
"Carrier 50XC", "Trane XR", "scroll assembly",
"compressor seized", "refrigerant leak",
"3BAK-0601", "FS-20260518",
]),
"format_turns": "true",
"speaker_labels": "true", # tech vs. customer on-site
})
async def stream_field_notes(audio_iter, send_to_fsm):
url = f"wss://streaming.assemblyai.com/v3/ws?{params}"
async with websockets.connect(
url, additional_headers={"Authorization": API_KEY},
) as ws:
async def send_audio():
async for chunk in audio_iter:
await ws.send(chunk)
asyncio.create_task(send_audio())
async for raw in ws:
evt = json.loads(raw)
if evt.get("type") == "Turn" and evt.get("end_of_turn"):
# finalized turn → LLM Gateway extracts {wo_id, part, status}
fields = extract_work_order_fields(evt["transcript"])
send_to_fsm(fields)
Universal-3 Pro Streaming delivers 28% better consecutive number recognition for alphanumeric sequences. Add part catalogs and customer asset terms via keyterm prompting (up to 100 per session) for near-perfect domain accuracy.
Universal-3 Pro Streaming is trained on noisy real-world audio — HVAC compressors, generators, road traffic, wind. The model stays accurate where consumer ASR breaks down, so field-truck dictation works the first time.
Universal-3 Pro Streaming handles 6 core languages with native code-switching at the highest accuracy. Automatic model routing extends coverage to 99 languages — field technicians dictate work-order notes in their preferred language and your FSM system receives clean transcripts every time.
Calabrio's enterprise workforce intelligence platform runs on AssemblyAI for real-time transcription accuracy across multilingual call recordings — the same audio fundamentals that power hands-free field workflows.
Calabrio
It's one microphone picking up a bunch of different voices.
Jake Cronin, Co-founder & CEO — Siro