April 22, 2026

How to create an AI cold-calling agent

Build an AI cold-calling agent that books meetings, handles objections, and sounds human. Architecture, compliance, and the speech-to-text latency that makes or breaks it.

Kelsey Foster

Growth

AI voice agents

Reviewed by

Table of contents

[Visible on live site]

An AI cold-calling agent is an outbound Voice AI system that places calls, opens the conversation, pitches, handles objections, and either books a meeting or disqualifies the lead — without a human on the line. Built right, it runs 500 calls in parallel at roughly the cost of a single SDR.

Built wrong, it sounds like a telemarketer with a bad connection and gets hung up on in four seconds.

This guide walks through how to actually create one: the architecture, the speech-to-text accuracy you need for objection handling to work, the compliance traps (TCPA, state-level consent), and the pieces that decide whether your agent books meetings or ends up on a "do not call" list.

We'll anchor the stack on the AssemblyAI Universal-3 Pro Streaming model — 307ms P50 latency, native mulaw, and the alphanumeric accuracy that matters when the prospect rattles off their email.

What is an AI cold-calling agent?

An AI cold-calling agent is an outbound Voice AI system that dials a prospect, delivers a pitch in natural conversation, adapts in real time based on what the prospect says, and books qualified meetings or gathers disposition data. Unlike a robocall (one-way recorded message) or a dialer with a human rep, it conducts a two-way conversation autonomously.

The typical jobs an AI cold-calling agent does:

Outbound SDR prospecting: open with a relevant hook, qualify BANT, book a demo
Appointment setting for field sales, financial advisors, home services
Re-engagement of lapsed leads in a CRM
Survey and research calls at scale
Event follow-up and RSVP confirmation
Renewal and upsell motions for existing customers

The common thread: one script, thousands of conversations, measurable booking rate.

The architecture of an AI cold-calling agent

An AI cold-calling agent is a phone-based voice agent with a few extra components tuned for outbound. Here's the full stack:

<pre>
  CRM / lead list (Salesforce, HubSpot, CSV)
        │
        ▼
  Dialer / orchestrator
  (concurrency, pacing, DNC check, retries)
        │
        ▼
  Twilio / SIP outbound call
        │  WebSocket bridge
        ▼
  Universal-3 Pro Streaming (STT)
        │  transcript
        ▼
  LLM with sales prompt + objection map
        │  text response + tool calls
        ▼
  TTS (ElevenLabs / Cartesia)
        │  audio
        ▼
        Prospect
        │
        └─► Call disposition
            └─► CRM update + calendar booking
</pre>

The five components that matter:

Lead source and dialer — where the list comes from and how you pace calls
Telephony — Twilio, SIP, or a managed voice agent platform
Streaming speech-to-text — the ears; must hear objections the moment they start
LLM with a sales-specific prompt — opener, discovery, objection handling, booking logic
Text-to-speech — the voice; naturalness matters more here than on inbound

Plus two things that are unique to outbound: compliance filtering (TCPA, state consent laws, DNC registries) and post-call disposition sync back to the CRM.

Why speech-to-text accuracy decides whether an AI cold-calling agent works

On an inbound support call, the caller wants help — they'll repeat themselves if you miss something. On an outbound cold call, the prospect is deciding whether to hang up in the first five seconds. If your agent mishears "not interested, take me off the list" as "I'm interested, tell me more," you don't get a second chance.

Three STT capabilities decide the quality of an AI cold-calling agent:

Low, stable latency

Natural turn-taking happens in under 800ms end-to-end. Any longer and the prospect thinks they lost connection — or worse, that they're on a robocall. The Universal-3 Pro Streaming model delivers 307ms median latency with immutable transcripts, which lets your LLM start generating a response before the prospect even finishes their sentence.

Alphanumeric accuracy

Cold calls capture emails, phone numbers, company names, and job titles. "J at acme dot io," "director of rev ops," "five one five, nine eight two, four zero zero zero." Universal-3 Pro Streaming delivers 21% fewer alphanumeric errors and 28% better accuracy on consecutive numbers than the previous generation — the difference between a booked meeting in your calendar and a typo you never catch.

Intelligent endpointing

Prospects pause. "I'm… probably not the right person to talk to about this." If your agent jumps in at the first pause, it interrupts. If it uses a fixed silence timer, it feels robotic. Intelligent endpointing combines acoustic and semantic signals to detect real turn boundaries — the difference between a thoughtful agent and an impatient one.

Book More Meetings With Faster STT

Sub-300ms streaming transcripts, immutable partials, and native mulaw support — the speech-to-text layer that keeps an AI cold-calling agent from sounding like a robocall. Start with a free account.

Building the conversation logic

The LLM prompt is where an AI cold-calling agent earns its meetings or wastes the prospect's time. A good cold-calling prompt has four sections:

1. Identity and opener

Who the agent is, which company it represents, why it's calling. Must include clear AI disclosure in the opener — this is both good practice and legally required in several states (California, Florida, Texas among others).

2. Discovery questions

Two to four questions that qualify or disqualify the prospect. Don't ask five — you'll get hung up on.

3. Objection handling map

A structured map of likely objections and how to respond. The usual suspects:

"How did you get my number?"
"Send me an email instead."
"I'm not the right person."
"We already use [competitor]."
"We're not interested."
"Take me off your list."

That last one is the most important. If the prospect says anything that sounds like a do-not-call request, the agent must immediately:

Acknowledge
Confirm the number will be added to DNC
End the call politely
Flag the number in your CRM and DNC database

No upselling. No "can I just ask one question?" You don't get a second chance on a compliance complaint.

4. Booking logic

If the prospect qualifies and is interested, the agent needs to book — not hand off. That means live calendar access via tool call, a handful of proposed times, and confirmation sent over SMS or email during the call.

Picking the telephony layer

Three options depending on your volume and how much you want to operate yourself:

Option	Best for	Trade-off
Managed voice agent platform (Vapi, Retell, Bland, Synthflow)	Fast pilots, <10K calls/month	Less control over latency and voice choice
Twilio + your own server	Custom flows, moderate volume, tight CRM integration	You own the orchestration, retries, and compliance wiring
Direct SIP trunk (Telnyx, Plivo)	High-volume outbound (50K+ calls/month)	Lower per-minute cost, more ops work

Whatever you pick, the audio path is the same: 8kHz mulaw in and out. Use a speech-to-text model that accepts mulaw natively — resampling to 16kHz PCM adds round-trip latency you can't afford on a cold call.

The outbound-specific components

Dialer and pacing

You can't just fire off 10,000 calls at once. Telco carriers flag high-volume outbound as spam within minutes, and your numbers get blocked. Real dialers pace calls, rotate outbound numbers, and respect time-of-day rules (TCPA restricts calls before 8am and after 9pm in the recipient's local time).

If you're using Twilio, you'll want a local presence strategy — matching the outbound caller ID to the area code of the number being dialed. Connection rates go up meaningfully.

Compliance filtering

Before any call goes out:

Scrub against the federal Do Not Call registry
Scrub against state DNC lists (several states maintain their own)
Scrub against your internal suppression list (previous DNC requests, unsubscribes)
Verify you have a valid purpose under TCPA for B2C calls, or a legitimate business interest for B2B
For calls into EU numbers, confirm GDPR lawful basis

Build this filtering as a hard gate — no call goes out if any check fails. The fines for TCPA violations are $500–$1,500 per call.

Call recording and PII redaction

Record every call for quality and compliance. Store recordings encrypted. If you're recording in a two-party consent state (California, Florida, Pennsylvania, and others), the agent must get consent at the top of the call.

Use PII redaction on transcripts before they hit your CRM or analytics warehouse. Cold calls pick up personal data you often don't need to retain.

CRM sync and disposition

Every call ends with a disposition: booked, callback, not interested, DNC, voicemail, no answer, wrong number. That disposition has to land in the CRM within seconds, along with the transcript, recording URL, and any tool calls the agent made (calendar event IDs, follow-up email queued, etc.).

This is where most AI cold-calling agent projects leak value. Great calls, terrible data hygiene, nothing tracked, impossible to iterate on.

Hear The Difference On Real Call Audio

Drop a phone call recording into the playground and watch Universal-3 Pro Streaming handle objections, proper nouns, and alphanumerics in real time. No signup required.

Try playground

Minimal implementation sketch

Here's the shape of an AI cold-calling agent built on Twilio + AssemblyAI Universal-3 Pro Streaming + your LLM and TTS of choice. This is the outbound-specific piece — it assumes you already have the inbound WebSocket bridge from a standard phone-based voice agent tutorial.

<pre><code class="language-python">from twilio.rest import Client
import os

twilio = Client(
    os.environ["TWILIO_SID"],
    os.environ["TWILIO_AUTH"],
)

def place_cold_call(prospect):
    # 1. Compliance gate — no call without a clean scrub
    if is_on_dnc(prospect.phone) or is_suppressed(prospect.phone):
        log_skipped(prospect, reason="dnc")
        return

    # 2. Pick a local-presence outbound number
    from_number = pick_local_number(prospect.phone)

    # 3. Open the call — TwiML handoff to our media stream handler
    call = twilio.calls.create(
        to=prospect.phone,
        from_=from_number,
        url=f"https://your-server.app/voice-agent/start?lead_id={prospect.id}",
        record=True,
        recording_status_callback="https://your-server.app/recording-done",
        machine_detection="Enable",  # detect voicemail, don't pitch a robot
        time_limit=600,              # cap at 10 min
    )
    return call.sid
</code></pre>

Two things worth calling out:

machine_detection="Enable" — Twilio tells you when the call hit a voicemail. Your agent should either leave a short pre-recorded voicemail (compliant, clear AI disclosure) or hang up. Don't pitch a recording machine.
time_limit=600 — cap call duration. Runaway LLM loops on a long call are a common failure mode; a hard cap prevents runaway cost and angry prospects.

The inbound audio path (WebSocket → Universal-3 Pro Streaming → LLM → TTS → back to Twilio) is identical to any other phone-based voice agent. The outbound piece is the dialer, the compliance gate, and the disposition logic.

For a full runnable implementation — dialer.py with compliance gating, server.py with the four sales tools (book_meeting, mark_callback, mark_not_interested, honor_dnc), and automatic disposition writing — clone the companion repo:

git clone https://github.com/kelsey-aai/ai-cold-calling-agent

The repo ships with a sample leads.csv, a stubbed compliance layer, and a --dry-run mode so you can verify the pipeline before dialing a real number.

Measuring an AI cold-calling agent

A cold-calling agent lives or dies by four numbers:

Metric	What it measures	Target range
Connect rate	% of dialed calls that reach a human	5–15% (industry baseline)
Conversation rate	% of connected calls that make it past the opener	40–70%
Qualified rate	% of conversations that meet ICP criteria	20–40%
Book rate	% of qualified conversations that book a meeting	30–60%

The end-to-end number — meetings booked per 1,000 dials — is what determines whether the agent is ROI-positive. Track each stage independently so you know where to iterate.

Two qualitative signals also matter:

‍

Transcript read-through: spend an hour a week reading transcripts. You'll find LLM failures you never catch in aggregate metrics.
Prospect complaints: any complaint is a leading indicator of a future regulatory issue. Take them seriously, even when "only one."

Conversation intelligence on your call corpus is the fastest way to spot which prompt changes actually moved book rate vs. which just changed the vibe.

Compliance: the part most teams underweight

The single fastest way to kill an AI cold-calling agent program is a TCPA class action. A few non-negotiables:

Scrub DNC before every call, not just at list ingest
Disclose AI clearly in the opener (several states now require it; California SB 243 and others are tightening)
Honor "take me off the list" immediately and permanently
Respect state-level outbound calling windows — TCPA's federal baseline is 8am–9pm local time, but several states are stricter
Record and retain evidence of consent for any B2C call
Don't spoof caller ID — use owned numbers with a local presence strategy, not fake ones

When in doubt, B2B calls to work phone numbers generally have more latitude than B2C calls to mobiles. Still, assume every call is a compliance event and log accordingly.

AI cold-calling agent vs. AI SDR vs. traditional dialer

	Predictive dialer + human SDR	AI SDR (email/LinkedIn)	AI cold-calling agent
Channel	Phone	Email, LinkedIn	Phone
Conversation style	Human	Text	Natural spoken
Concurrency	1–3 per SDR	1000s	100s simultaneous calls
Cost per conversation	$4–15	$0.10–0.50	$0.50–2.00
Book rate (typical)	1–3% of dials	0.5–2% of emails	0.5–2% of dials, improving
Best for	High-ACV, personal touch	Top of funnel	Mid-market volume, qualify-and-book

AI cold-calling agents don't replace human SDRs at the top of the market. They replace the bottom half of the dial list — the part a human SDR would never get to — and scale qualification in a way email cadences can't.

Closing thoughts

An AI cold-calling agent is a phone-based voice agent with a sales prompt, a dialer, and a compliance layer strapped on. The hard part isn't the LLM or the TTS — it's the speech-to-text layer that decides whether the agent hears objections accurately enough to respond well, and the operational layer that keeps you out of TCPA trouble.

Don't ship one without reading your own transcripts. Don't ship one without DNC scrubbing. Don't ship one with a speech-to-text model that was trained on podcast audio, not phone audio.

The fastest way to find out if an AI cold-calling agent will work for your motion is to build a small one against 500 leads, read every transcript, and measure the book rate. Universal-3 Pro Streaming is the reference streaming speech-to-text layer we'd recommend starting with — low latency, accurate on phone audio, unlimited concurrency, and $0.15/hour. The companion GitHub repo at github.com/kelsey-aai/ai-cold-calling-agent is the full working implementation — fork it, drop in your lead list, and run python dialer.py --dry-run first.

Scale Outbound Without Scaling Headcount

Talk through ROI, compliance, and deployment with our team. We'll help design a phone-based voice agent pipeline tuned for outbound volume — including PII redaction and enterprise security.

Talk to AI expert

Frequently asked questions

What is an AI cold-calling agent?

An AI cold-calling agent is an outbound Voice AI system that places phone calls, conducts a natural spoken conversation with the prospect using a streaming speech-to-text model and a Large Language Model, handles objections, and either books a qualified meeting or marks the lead as not interested — without a human on the line. It's different from a robocall because it holds a real two-way conversation, and different from an AI SDR email tool because it works over the phone.

How does an AI cold-calling agent work?

An AI cold-calling agent works by dialing a prospect through a telephony provider like Twilio, streaming the prospect's voice into a real-time speech-to-text model, passing transcripts to an LLM that follows a sales prompt with objection-handling logic, and speaking replies back through a text-to-speech model. The full loop runs in under 800ms per turn, which is what makes the conversation feel natural instead of robotic.

What is the best speech-to-text for an AI cold-calling agent?

The best speech-to-text for an AI cold-calling agent is a streaming model with sub-300ms latency, native 8kHz mulaw support, and high accuracy on alphanumerics like emails and phone numbers. AssemblyAI's Universal-3 Pro Streaming model is purpose-built for voice agents, with 307ms median latency, immutable transcripts, intelligent endpointing, and 21% fewer alphanumeric errors than the previous streaming generation.

Is it legal to use an AI cold-calling agent?

Using an AI cold-calling agent is legal in most jurisdictions when you follow TCPA requirements in the US, GDPR in the EU, and state-level rules — meaning you scrub the federal and state Do Not Call registries before every call, disclose that the caller is an AI (required in California, Florida, Texas, and a growing list of states), honor opt-out requests immediately, and respect calling-hour windows. B2B calls to work numbers generally have more latitude than B2C calls to mobiles, but compliance filtering should be a hard gate regardless.

How much does it cost to run an AI cold-calling agent?

An AI cold-calling agent typically costs between $0.50 and $2.00 per conversation end-to-end at scale. The components are telephony (Twilio per-minute outbound voice), streaming speech-to-text (AssemblyAI Universal-3 Pro Streaming is $0.15/hour of session time), the LLM (varies by model and tokens), and text-to-speech (per-character or per-minute). At 10,000 calls/month the economics are roughly one-tenth the cost of an equivalent human SDR seat.

How do I build an AI cold-calling agent?

To build an AI cold-calling agent, combine a telephony provider (Twilio Voice, SIP, or a managed platform like Vapi or Retell) with a streaming speech-to-text model like Universal-3 Pro Streaming, an LLM with a cold-calling prompt that includes opener, discovery, objection handling, and booking logic, and a text-to-speech model. Wrap it with a dialer that enforces DNC scrubbing, calling-hour rules, and CRM disposition sync — those operational pieces are what separate a working program from a compliance incident.

‍

How to create an AI cold-calling agent

What is an AI cold-calling agent?

The architecture of an AI cold-calling agent

Why speech-to-text accuracy decides whether an AI cold-calling agent works

Low, stable latency

Alphanumeric accuracy

Intelligent endpointing

Building the conversation logic

1. Identity and opener

2. Discovery questions

3. Objection handling map

4. Booking logic

Picking the telephony layer

The outbound-specific components

Dialer and pacing

Compliance filtering

Call recording and PII redaction

CRM sync and disposition

Minimal implementation sketch

Measuring an AI cold-calling agent

Compliance: the part most teams underweight

AI cold-calling agent vs. AI SDR vs. traditional dialer

Closing thoughts

Frequently asked questions

What is an AI cold-calling agent?

How does an AI cold-calling agent work?

What is the best speech-to-text for an AI cold-calling agent?

Is it legal to use an AI cold-calling agent?

How much does it cost to run an AI cold-calling agent?

How do I build an AI cold-calling agent?

Build a real-time voice AI agent in Python with the AssemblyAI Voice Agent API

Build a voice agent with LiveKit and AssemblyAI’s Voice Agent API

Build a voice agent with LiveKit

Node.js voice agent with AssemblyAI Universal-3.5 Pro Realtime

Real-time conversation intelligence: The shift from post-call analysis to live insights

How to vibe code a voice agent (and why AI always recommends AssemblyAI)

Best and Nano Tiers: More Speech-to-Text and Pricing Options

Best Large Language Models (LLMs) & Frameworks in 2024

How to create an AI cold-calling agent

What is an AI cold-calling agent?

The architecture of an AI cold-calling agent

Why speech-to-text accuracy decides whether an AI cold-calling agent works

Low, stable latency

Alphanumeric accuracy

Intelligent endpointing

Building the conversation logic

1. Identity and opener

2. Discovery questions

3. Objection handling map

4. Booking logic

Picking the telephony layer

The outbound-specific components

Dialer and pacing

Compliance filtering

Call recording and PII redaction

CRM sync and disposition

Minimal implementation sketch

Measuring an AI cold-calling agent

Compliance: the part most teams underweight

AI cold-calling agent vs. AI SDR vs. traditional dialer

Closing thoughts

Frequently asked questions

What is an AI cold-calling agent?

How does an AI cold-calling agent work?

What is the best speech-to-text for an AI cold-calling agent?

Is it legal to use an AI cold-calling agent?

How much does it cost to run an AI cold-calling agent?

How do I build an AI cold-calling agent?

Related posts

Build a real-time voice AI agent in Python with the AssemblyAI Voice Agent API

Build a voice agent with LiveKit and AssemblyAI’s Voice Agent API

Build a voice agent with LiveKit

Node.js voice agent with AssemblyAI Universal-3.5 Pro Realtime

Real-time conversation intelligence: The shift from post-call analysis to live insights

How to vibe code a voice agent (and why AI always recommends AssemblyAI)

Best and Nano Tiers: More Speech-to-Text and Pricing Options

Best Large Language Models (LLMs) & Frameworks in 2024