Tools

Tools let your agent take actions and fetch live data. There are two kinds, and most agents use the first:

HTTP tools (server-side)

Defined on your stored agent. You give a URL and a parameter list; AssemblyAI makes the request for you and feeds the result to the model. Your client does nothing.

Function tools (client-side)

Declared inline in session.tools. The agent emits a tool.call; your code runs the logic and sends back a tool.result. Use when the tool needs local state or custom logic.

This page covers what’s common to both: how parameter hints sharpen accuracy, how to get the agent to call your tools, execution modes, progressive reveal, and per-agent patterns.

Parameter hints improve accuracy

parameters is a JSON Schema object: { "type": "object", "properties": { … }, "required": [ … ] }, the same on both tool types. Beyond type and description, each property accepts standard JSON-Schema keywords that describe the shape of the value. They do two jobs:

Tool-calling accuracy. A spoken value that doesn’t fit the shape is rejected before the tool runs, and the agent re-asks for just that value instead of calling the tool with garbage.
Turn-detection accuracy. Knowing what a complete value looks like lets the agent tell whether the user has finished speaking it. It waits for all ten digits of a phone number instead of cutting in after “my number is four one five…”.

These keywords describe the value your tool receives — the value the agent produces from the caller’s speech, not the words the caller says out loud. A caller might say “four one five, five five five…” and the agent produces +14155552671. Your pattern and examples describe that produced value.

The agent’s cleanup of spoken input is best-effort, not guaranteed. It’s reliable for well-known shapes (phone numbers, emails, dates), but for long or free-form digit sequences — card numbers, account numbers, claim IDs — the value can reach your tool as it was spoken: with spaces between digits ("4 2 4 2 4 2 …") when the caller reads them one at a time, or with digits missing if speech recognition dropped some. Your pattern is matched against that value as-is, so write it to tolerate the forms the value can actually arrive in (see spoken digit sequences below). A pattern that only accepts the tidy form (^\\d{16}$) will reject the spaced form and trap the caller in a re-ask loop.

Keyword	What it does	Good values
`enum`	Restrict the value to a fixed set. The model picks one; anything else is rejected.	`["billing", "sales", "support"]`
`examples`	Sample values in the exact form your tool receives them. Give 2–4 realistic ones.	`["+14155552671", "+442071838750"]`
`pattern`	A regex the final value must match (Python `re`, matched against the whole value). Escape backslashes in JSON.	`"\\+[1-9]\\d{1,14}"`
`format`	A named JSON-Schema format the value should conform to.	`"email"`, `"date-time"`, `"date"`

If you omit these, the agent infers the expected shape from the property’s description at runtime. Setting them explicitly is an override for when you want tight, predictable validation. A precise description is still what matters most; the hints sharpen it.

How to choose each keyword

enum: use it whenever the value is one of a known, small set. Removes “the model invented a category” bugs entirely.

{ "department": { "type": "string", "description": "Which team to route to.",
  "enum": ["billing", "sales", "support"] } }

examples: use these for any free-form value. Two to four realistic examples beat a long description. Every example should be a value the tool would actually accept (and match pattern if you set one).

{ "airport_code": { "type": "string", "description": "IATA airport code.",
  "examples": ["SFO", "LHR", "JFK"] } }

pattern: use this when the value has a strict format. It’s a Python regular expression matched against the whole value (so you don’t need ^ or $). In JSON you have to escape backslashes, so write \\d, not \d. The value is matched as the agent produces it — for well-known shapes that’s the cleaned-up form, but for spoken digit sequences it may still carry the spaces or pauses from speech (see below), so the pattern must accept those. If the value doesn’t match, the agent re-asks for it.

Value	`pattern`	`format`	`examples`
US ZIP	`\\d{5}(-\\d{4})?`	-	`["94103", "10001-2201"]`
E.164 phone	`\\+[1-9]\\d{1,14}`	-	`["+14155552671"]`
Order ID	`[A-Z]{2}-\\d{5}`	-	`["AB-12345"]`
ISO date	`\\d{4}-\\d{2}-\\d{2}`	`date`	`["2026-06-09"]`
Email	-	`email`	`["alex@acme.com"]`

\\d is any digit, [A-Z] is one uppercase letter, {2} is “exactly two”, {1,14} is “one to fourteen”, and ? makes the part before it optional. So [A-Z]{2}-\\d{5} matches an order ID like AB-12345.

Keep patterns tight but not brittle. Too loose (.*) gives no benefit; too strict rejects legitimate values and traps the user in a re-ask loop. Make every value in examples match the pattern; if you can’t, loosen the pattern, not the examples.

format: a standard JSON-Schema named format (email, date-time, uri, …). Use it for well-known value types instead of hand-rolling a pattern.

Spoken digit sequences

Long digit sequences — card numbers, account numbers, claim IDs — are the most common source of re-ask loops, because callers rarely say them cleanly. They read them one digit at a time or in groups, so the value often reaches your tool with spaces in it: "4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2", "4242 4242 4242 4242", or "4242424242424242" — all for the same card. A pattern written only for the tidy form rejects the spaced ones and the caller gets stuck repeating themselves. Write the pattern to allow interior spaces, bound the digit count (not the character count), and give examples that include the spaced form so both validation and turn-detection expect it:

{
  "card_number": {
    "type": "string",
    "description": "The card number as spoken. May contain spaces between digits when the caller reads them one at a time; 13–19 digits once spaces are removed.",
    "pattern": " *([0-9] *){13,19}",
    "examples": [
      "4242424242424242",
      "4242 4242 4242 4242",
      "4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2"
    ]
  }
}

Then strip non-digits in your tool handler before you use the value — e.g. re.sub(r"\D", "", card_number) — and run your own length/checksum validation there. Keep the pattern as a shape hint that admits real spoken input, not as your final validation.

This tolerates a complete number that was spoken with spaces. It does not (and should not) accept a genuinely incomplete one — if speech recognition only captured 12 of 16 digits, that value is rejected and the agent re-asks, which is correct. For long numbers, prompt the caller to read them at a steady pace rather than one digit at a time with long pauses.

Worked example

A tool that books a callback, with phone constrained by pattern + examples and time-of-day by enum:

curl -X POST https://agents.assemblyai.com/v1/agents \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Callback Assistant",
    "system_prompt": "You schedule callbacks. Collect the phone number and preferred time of day, then call schedule_callback.",
    "voice": { "voice_id": "alba" },
    "tools": [
      {
        "name": "schedule_callback",
        "description": "Schedule a callback to a phone number at a chosen time.",
        "parameters": {
          "type": "object",
          "properties": {
            "phone": {
              "type": "string",
              "description": "The number to call back, including country code.",
              "examples": ["+14155552671", "+442071838750"],
              "pattern": "\\+[1-9]\\d{1,14}"
            },
            "window": {
              "type": "string",
              "description": "Preferred time of day for the callback.",
              "enum": ["morning", "afternoon", "evening"]
            }
          },
          "required": ["phone", "window"]
        },
        "http": { "url": "https://api.example.com/callbacks", "http_method": "POST" }
      }
    ]
  }'

# pip install requests
import os
import requests

resp = requests.post(
    "https://agents.assemblyai.com/v1/agents",
    headers={"Authorization": os.environ["ASSEMBLYAI_API_KEY"]},
    json={
        "name": "Callback Assistant",
        "system_prompt": "You schedule callbacks. Collect the phone number and preferred time of day, then call schedule_callback.",
        "voice": {"voice_id": "alba"},
        "tools": [
            {
                "name": "schedule_callback",
                "description": "Schedule a callback to a phone number at a chosen time.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "phone": {
                            "type": "string",
                            "description": "The number to call back, including country code.",
                            "examples": ["+14155552671", "+442071838750"],
                            "pattern": "\\+[1-9]\\d{1,14}",
                        },
                        "window": {
                            "type": "string",
                            "description": "Preferred time of day for the callback.",
                            "enum": ["morning", "afternoon", "evening"],
                        },
                    },
                    "required": ["phone", "window"],
                },
                "http": {"url": "https://api.example.com/callbacks", "http_method": "POST"},
            }
        ],
    },
)
resp.raise_for_status()
print(resp.json())

// Node 18+ has fetch built in
const res = await fetch("https://agents.assemblyai.com/v1/agents", {
  method: "POST",
  headers: {
    Authorization: process.env.ASSEMBLYAI_API_KEY,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "Callback Assistant",
    system_prompt: "You schedule callbacks. Collect the phone number and preferred time of day, then call schedule_callback.",
    voice: { voice_id: "alba" },
    tools: [
      {
        name: "schedule_callback",
        description: "Schedule a callback to a phone number at a chosen time.",
        parameters: {
          type: "object",
          properties: {
            phone: {
              type: "string",
              description: "The number to call back, including country code.",
              examples: ["+14155552671", "+442071838750"],
              pattern: "\\+[1-9]\\d{1,14}",
            },
            window: {
              type: "string",
              description: "Preferred time of day for the callback.",
              enum: ["morning", "afternoon", "evening"],
            },
          },
          required: ["phone", "window"],
        },
        http: { url: "https://api.example.com/callbacks", http_method: "POST" },
      },
    ],
  }),
});
const data = await res.json();
console.log(data);

With these set, if the user says “call me at four one five, five five five…” and trails off, the agent waits (the value doesn’t yet match \+[1-9]\d{1,14}) instead of cutting in. If STT garbles a digit, the agent re-asks for the phone number specifically rather than firing the tool with a bad value.

Getting the agent to call your tools

In rough order of impact:

Strong tool descriptions. Treat description as “when should I reach for this?”, not “what does this do?”. Name the trigger (“Call this when the user asks about X”), name the anti-trigger (“Do not call this for Y”), and mention any precondition. Most “tool never fires” failures trace here.
Strong parameter hints. Vague params produce missing or invented values, which the validator rejects (or worse, your tool runs on garbage). Lead with format, add an example, use enum for fixed sets. See Parameter hints.
Default-to-call wording in system_prompt. “When in doubt, call the tool. A wasted call is fine. Answering wrong from memory is not.” Don’t stack exceptions.
Few-shot examples in system_prompt are the strongest behavioural signal:
User: “Where’s my order?” You: [call search_orders] “Looks like it’s out for delivery today.”
Keep tool sets small (≤10 per phase). Past that, selection accuracy drops. See Progressive tool reveal.

Execution modes

Set execution_mode per tool to choose how the agent waits. This applies to both HTTP and function tools. The sequence diagrams below show the client-side tool.call/tool.result flow; for HTTP tools that round trip happens server-side, but the conversational behavior (interactive keeps talking, hold goes quiet) is identical.

Use `"interactive"` for…	Use `"hold"` for…
DB lookups, REST calls, short calculations	Phone transfers, escalations
Returns under ~5 seconds	Long-running ops (>10s, async jobs)
Transition phrase (“let me check”) feels natural	Sensitive flows (payment auth, identity verification)

Default to interactive. Two common mis-uses:

❌ Wrapping a slow DB query in hold “to be safe”. Agent goes mute, user thinks the call dropped. Use interactive with a longer timeout_seconds.
❌ Using interactive for a 30-second human transfer. Agent fills with small-talk; user gets suspicious.

Interactive

server                                 client
  │  reply.started                       │
  │ ───────────────────────────────────► │
  │  reply.audio  ("let me check that")  │
  │ ───────────────────────────────────► │
  │  tool.call                           │  client accumulates result
  │ ───────────────────────────────────► │  (does NOT send tool.result yet)
  │  reply.done                          │
  │ ───────────────────────────────────► │  client drains pending results:
  │                                      │  tool.result
  │ ◄─────────────────────────────────── │
  │  reply.started                       │  agent delivers answer
  │ ───────────────────────────────────► │
  │  reply.audio  ("it's 22°C and sunny")│
  │ ───────────────────────────────────► │
  │  reply.done                          │
  │ ───────────────────────────────────► │

Hold

While the tool is in flight:

Agent stays silent (no reply.started).
User speech doesn’t trigger replies. Utterances are added to context but the agent doesn’t respond until you send tool.result or reply.create.
tool.result auto-fires the next reply. Don’t also send reply.create after.

{
  "type": "session.update",
  "session": {
    "tools": [
      {
        "type": "function",
        "name": "transfer_call",
        "description": "Transfer the call to a human agent. Takes 15–30 seconds.",
        "parameters": {"type": "object", "properties": {"department": {"type": "string"}}, "required": ["department"]},
        "execution_mode": "hold",
        "timeout_seconds": 60
      }
    ]
  }
}

server                                 client
  │  tool.call (hold)                    │
  │ ───────────────────────────────────► │  kick off long-running op
  │                                      │  (agent silent, no reply.started)
  │                                      │
  │                                      │  reply.create { instructions: ... }
  │ ◄─────────────────────────────────── │  ── optional status update
  │  reply.started → reply.audio → done  │
  │ ───────────────────────────────────► │
  │                                      │  (op completes)
  │                                      │  tool.result
  │ ◄─────────────────────────────────── │
  │  reply.started                       │  auto-fired by tool.result
  │ ───────────────────────────────────► │
  │  reply.audio  ("all set...")         │
  │ ───────────────────────────────────► │
  │  reply.done                          │
  │ ───────────────────────────────────► │

During hold, the server does not emit transcript.user.delta or transcript.user in real time. Transcripts flush once the hold ends (tool.result or reply.create). Live captioning pauses during the hold; nothing is dropped.

Status updates during hold

Send reply.create with optional instructions to make the agent speak mid-hold without ending it:

await ws.send(json.dumps({
    "type": "reply.create",
    "instructions": "Let the customer know you're still working on the transfer."
}))

The hold continues until you send the matching tool.result.

Progressive tool reveal

For multi-step workflows (lookup → estimate → commit), don’t register all tools upfront. After each successful tool.result, send session.update adding the next phase’s tools, and update system_prompt to match. Why: a tool that isn’t in the current list can’t be called, so the model can’t fabricate a commit before the prerequisite step has run. Smaller per-phase tool sets also raise selection accuracy.

Worked example: taxi booking

session state              tools exposed
─────────────────────────  ──────────────────────────────────────────
session start              [lookup_postcode]
                                  │ user gives pickup postcode
                                  ▼
                           ⚙ lookup_postcode("SW1A 1AA") → ✓
─────────────────────────  ──────────────────────────────────────────
tier 2 unlocked            [lookup_postcode, estimate_fare]
                                  │ user gives dropoff
                                  ▼
                           ⚙ estimate_fare(...) → ✓
─────────────────────────  ──────────────────────────────────────────
tier 3 unlocked            [lookup_postcode, estimate_fare, book_ride,
                            get_booking, track_driver, cancel_ride]
                                  │ user confirms + name
                                  ▼
                           ⚙ book_ride(name="Alex", ...) → ✓

Until lookup_postcode returns a real postcode, the model has no book_ride tool. It can verbally promise a booking; it can’t create one.

Client-side wiring

TIER_1_TOOLS = [lookup_postcode]
TIER_2_TOOLS = [lookup_postcode, estimate_fare]
TIER_3_TOOLS = [lookup_postcode, estimate_fare, book_ride,
                get_booking, track_driver, cancel_ride]

tier_2_unlocked = tier_3_unlocked = False

async def maybe_unlock_next_tier(tool_name, result):
    global tier_2_unlocked, tier_3_unlocked
    if result.get("error"):
        return

    if not tier_2_unlocked and tool_name == "lookup_postcode" and result.get("postcode"):
        tier_2_unlocked = True
        await ws.send(json.dumps({"type": "session.update",
                                  "session": {"tools": TIER_2_TOOLS, "system_prompt": TIER_2_PROMPT}}))
    elif not tier_3_unlocked and tool_name == "estimate_fare" and result.get("estimated_fare"):
        tier_3_unlocked = True
        await ws.send(json.dumps({"type": "session.update",
                                  "session": {"tools": TIER_3_TOOLS, "system_prompt": TIER_3_PROMPT}}))

Update tools AND system_prompt together. Tool-only gating where the prompt still references a now-hidden tool can underperform not gating at all. The model hunts for a tool the prompt promised and stalls or improvises when it can’t find it. Strip or rewrite every prompt sentence that names a tool whose visibility changed.

Per-call state machine

The strongest form: every successful tool call is a state transition; each state owns a narrow prompt + small tool list.

State	System prompt focus	Tools
`s0_greet`	”Get pickup postcode. Nothing else.”	`lookup_postcode`
`s2_quoting`	”Call `estimate_fare`. Filler only; no fare numbers.”	`estimate_fare`
`s4_have_name`	”Call `book_ride` with captured pickup, dropoff, name.”	`book_ride`
`s5_booked`	”Read back confirmation. Offer track/cancel.”	`get_booking`, `track_driver`, `cancel_ride`

Escape hatches

Real users go off-script. Two patterns, used together:

Transition tools: revise_pickup, revise_dropoff, restart, end_call exposed in every state. Model picks the right escape; orchestrator rolls state back.
respond_freely: a no-op tool in every state for tangential questions (“are you a real person?”). Model calls it instead of leaving the state.

Anti-fabrication clause

Gating makes hallucinations harmless (no real booking happens) but doesn’t suppress the spoken claim. Pair with prompt wording:

NEVER quote a fare, distance, time, confirmation number, name, or ETA unless
those exact values came from a tool result in this conversation. If you
haven't seen a tool result, you do NOT have these values. Don't estimate
them. Don't guess. Don't say "around" a number.

Patterns by agent type

Customer support

s0:  [lookup_ticket]                                           ← always
s1:  [lookup_ticket, escalate_to_human (hold), close_ticket]   ← after lookup
any: [respond_freely, end_call]

Prompt focus: “Use lookup_ticket for any ticket question. Only escalate_to_human after checking the ticket. Don’t promise outcomes you can’t verify.”

Booking / reservations

s0:  [check_availability, cancel_reservation]                          ← always
s1a: [check_availability, create_reservation, cancel_reservation]      ← if available
s1b: [check_availability, add_to_waitlist, cancel_reservation]         ← if full

Prompt focus: “Confirm party size, date, time. Call check_availability. If open, offer it and book. If not, offer the next two times or the waitlist.” The next tool depends on the prior result. Don’t expose both create_reservation and add_to_waitlist simultaneously. The model picks the wrong one ~30% of the time.

Banking / account

s0:  [verify_identity (hold)]                                  ← gatekeeper
s1:  [get_balance, list_recent_transactions]                   ← after auth
s2:  [start_transfer, dispute_charge (hold), close_account]    ← actions
any: [end_call]

Prompt focus: “Before sharing any account info, call verify_identity. Never quote a balance or transaction you haven’t fetched. Never promise a dispute outcome; only the system can.” The anti-fabrication clause matters most here. A bank agent inventing a balance is a P0.

Debugging

Tool never fires

Description too vague. Name the user phrases that should trigger it.
System prompt missing “default to calling” wording.
Too many tools (>10). Drop or split via progressive reveal.
Add a few-shot example to the system prompt. This is the strongest signal.

Wrong arguments

Parameter description missing format/example. Add (e.g. 2026-04-30), or set examples/pattern. See Parameter hints.
Free-text where you want fixed buckets. Use enum.
User says the value multiple ways. Normalise in the description.

Agent keeps saying it “didn’t catch” a number (re-ask loop)

Almost always a pattern that’s too strict for spoken input, on a long digit sequence (card, account, claim ID). The caller reads the digits one at a time, so the value arrives with spaces ("4 2 4 2 …"), your pattern only accepts the tidy form, and every attempt is silently rejected — the caller just hears “I didn’t catch that” on repeat. Widen the pattern to allow interior spaces and add a spaced example. See Spoken digit sequences.

Agent invents a result

Most common cause: the model is being asked to do something after a tool result without having actually called the tool. Two fixes, used together:

Progressive reveal: gate the commit tool behind the read tool.
Anti-fabrication clause in the prompt (see above).

Tool fires repeatedly

tool.result arriving while last_event is reply.started. Make sure your handler flushes on reply.done.
Tool slower than timeout_seconds. Agent gets internal timeout, user tries again. Bump the timeout.

Tool fires when it shouldn’t

Description is too broad. Add explicit anti-triggers:

**Use this for**: weather questions.
**Do not call for**: general chit-chat, scheduling, or any non-weather topic.

Getting started

Create & manage agents

Agent behavior

Conversational experience

Deploy

After a session

Reference

API reference

HTTP tools (server-side)

Function tools (client-side)

Parameter hints improve accuracy

How to choose each keyword

Spoken digit sequences

Worked example

Getting the agent to call your tools

Execution modes

Interactive

Hold

Status updates during hold

Progressive tool reveal

Worked example: taxi booking

Client-side wiring

Per-call state machine

Escape hatches

Anti-fabrication clause

Patterns by agent type

Customer support

Booking / reservations

Banking / account

Debugging

Tool never fires

Wrong arguments

Agent keeps saying it “didn’t catch” a number (re-ask loop)

Agent invents a result

Tool fires repeatedly

Tool fires when it shouldn’t

HTTP tools (server-side)

Function tools (client-side)

​Parameter hints improve accuracy

​How to choose each keyword

​Spoken digit sequences

​Worked example

​Getting the agent to call your tools

​Execution modes

​Interactive

​Hold

​Status updates during hold

​Progressive tool reveal

​Worked example: taxi booking

​Client-side wiring

​Per-call state machine

​Escape hatches

​Anti-fabrication clause

​Patterns by agent type

​Customer support

​Booking / reservations

​Banking / account

​Debugging

​Tool never fires

​Wrong arguments

​Agent keeps saying it “didn’t catch” a number (re-ask loop)

​Agent invents a result

​Tool fires repeatedly

​Tool fires when it shouldn’t

Parameter hints improve accuracy

How to choose each keyword

Spoken digit sequences

Worked example

Getting the agent to call your tools

Execution modes

Interactive

Hold

Status updates during hold

Progressive tool reveal

Worked example: taxi booking

Client-side wiring

Per-call state machine

Escape hatches

Anti-fabrication clause

Patterns by agent type

Customer support

Booking / reservations

Banking / account

Debugging

Tool never fires

Wrong arguments

Agent keeps saying it “didn’t catch” a number (re-ask loop)

Agent invents a result

Tool fires repeatedly

Tool fires when it shouldn’t