HTTP tools (server-side)
Defined on your stored agent. You give a URL and a parameter list; AssemblyAI makes the request for you and feeds the result to the model. Your client does nothing.
Function tools (client-side)
Declared inline in
session.tools. The agent emits a tool.call; your code runs the logic and sends back a tool.result. Use when the tool needs local state or custom logic.Parameter hints improve accuracy
parameters is a JSON Schema object: { "type": "object", "properties": { … }, "required": [ … ] }, the same on both tool types. Beyond type and description, each property accepts standard JSON-Schema keywords that describe the shape of the value. They do two jobs:
- Tool-calling accuracy. A spoken value that doesn’t fit the shape is rejected before the tool runs, and the agent re-asks for just that value instead of calling the tool with garbage.
- Turn-detection accuracy. Knowing what a complete value looks like lets the agent tell whether the user has finished speaking it. It waits for all ten digits of a phone number instead of cutting in after “my number is four one five…”.
+14155552671. Your pattern and examples describe that final value, not the spoken form.
| Keyword | What it does | Good values |
|---|---|---|
enum | Restrict the value to a fixed set. The model picks one; anything else is rejected. | ["billing", "sales", "support"] |
examples | Sample values in the exact form your tool receives them. Give 2–4 realistic ones. | ["+14155552671", "+442071838750"] |
pattern | A regex the final value must match (Python re, matched against the whole value). Escape backslashes in JSON. | "\\+[1-9]\\d{1,14}" |
format | A named JSON-Schema format the value should conform to. | "email", "date-time", "date" |
If you omit these, the agent infers the expected shape from the property’s
description at runtime. Setting them explicitly is an override for when you want tight, predictable validation. A precise description is still what matters most; the hints sharpen it.How to choose each keyword
enum: use it whenever the value is one of a known, small set. Removes “the model invented a category” bugs entirely.
examples: use these for any free-form value. Two to four realistic examples beat a long description. Every example should be a value the tool would actually accept (and match pattern if you set one).
pattern: use this when the value has a strict format. It’s a Python regular expression matched against the whole value (so you don’t need ^ or $). In JSON you have to escape backslashes, so write \\d, not \d. The agent normalizes what it hears, then the value has to match or the agent re-asks for it.
| Value | pattern | format | examples |
|---|---|---|---|
| US ZIP | \\d{5}(-\\d{4})? | - | ["94103", "10001-2201"] |
| E.164 phone | \\+[1-9]\\d{1,14} | - | ["+14155552671"] |
| Order ID | [A-Z]{2}-\\d{5} | - | ["AB-12345"] |
| ISO date | \\d{4}-\\d{2}-\\d{2} | date | ["2026-06-09"] |
| - | email | ["alex@acme.com"] |
\\d is any digit, [A-Z] is one uppercase letter, {2} is “exactly two”, {1,14} is “one to fourteen”, and ? makes the part before it optional. So [A-Z]{2}-\\d{5} matches an order ID like AB-12345.
format: a standard JSON-Schema named format (email, date-time, uri, …). Use it for well-known value types instead of hand-rolling a pattern.
Worked example
A tool that books a callback, with phone constrained bypattern + examples and time-of-day by enum:
\+[1-9]\d{1,14}) instead of cutting in. If STT garbles a digit, the agent re-asks for the phone number specifically rather than firing the tool with a bad value.
Getting the agent to call your tools
In rough order of impact:- Strong tool descriptions. Treat
descriptionas “when should I reach for this?”, not “what does this do?”. Name the trigger (“Call this when the user asks about X”), name the anti-trigger (“Do not call this for Y”), and mention any precondition. Most “tool never fires” failures trace here. - Strong parameter hints. Vague params produce missing or invented values, which the validator rejects (or worse, your tool runs on garbage). Lead with format, add an example, use
enumfor fixed sets. See Parameter hints. - Default-to-call wording in
system_prompt. “When in doubt, call the tool. A wasted call is fine. Answering wrong from memory is not.” Don’t stack exceptions. - Few-shot examples in
system_promptare the strongest behavioural signal:User: “Where’s my order?” You: [call search_orders] “Looks like it’s out for delivery today.”
- Keep tool sets small (≤10 per phase). Past that, selection accuracy drops. See Progressive tool reveal.
Execution modes
Setexecution_mode per tool to choose how the agent waits. This applies to both HTTP and function tools. The sequence diagrams below show the client-side tool.call/tool.result flow; for HTTP tools that round trip happens server-side, but the conversational behavior (interactive keeps talking, hold goes quiet) is identical.
Use "interactive" for… | Use "hold" for… |
|---|---|
| DB lookups, REST calls, short calculations | Phone transfers, escalations |
| Returns under ~5 seconds | Long-running ops (>10s, async jobs) |
| Transition phrase (“let me check”) feels natural | Sensitive flows (payment auth, identity verification) |
- ❌ Wrapping a slow DB query in hold “to be safe”. Agent goes mute, user thinks the call dropped. Use interactive with a longer
timeout_seconds. - ❌ Using interactive for a 30-second human transfer. Agent fills with small-talk; user gets suspicious.
Interactive
Hold
While the tool is in flight:- Agent stays silent (no
reply.started). - User speech doesn’t trigger replies. Utterances are added to context but the agent doesn’t respond until you send
tool.resultorreply.create. tool.resultauto-fires the next reply. Don’t also sendreply.createafter.
During hold, the server does not emit
transcript.user.delta or transcript.user in real time. Transcripts flush once the hold ends (tool.result or reply.create). Live captioning pauses during the hold; nothing is dropped.Status updates during hold
Sendreply.create with optional instructions to make the agent speak mid-hold without ending it:
tool.result.
Progressive tool reveal
For multi-step workflows (lookup → estimate → commit), don’t register all tools upfront. After each successfultool.result, send session.update adding the next phase’s tools, and update system_prompt to match.
Why: a tool that isn’t in the current list can’t be called, so the model can’t fabricate a commit before the prerequisite step has run. Smaller per-phase tool sets also raise selection accuracy.
Worked example: taxi booking
lookup_postcode returns a real postcode, the model has no book_ride tool. It can verbally promise a booking; it can’t create one.
Client-side wiring
Per-call state machine
The strongest form: every successful tool call is a state transition; each state owns a narrow prompt + small tool list.| State | System prompt focus | Tools |
|---|---|---|
s0_greet | ”Get pickup postcode. Nothing else.” | lookup_postcode |
s2_quoting | ”Call estimate_fare. Filler only; no fare numbers.” | estimate_fare |
s4_have_name | ”Call book_ride with captured pickup, dropoff, name.” | book_ride |
s5_booked | ”Read back confirmation. Offer track/cancel.” | get_booking, track_driver, cancel_ride |
Escape hatches
Real users go off-script. Two patterns, used together:- Transition tools:
revise_pickup,revise_dropoff,restart,end_callexposed in every state. Model picks the right escape; orchestrator rolls state back. respond_freely: a no-op tool in every state for tangential questions (“are you a real person?”). Model calls it instead of leaving the state.
Anti-fabrication clause
Gating makes hallucinations harmless (no real booking happens) but doesn’t suppress the spoken claim. Pair with prompt wording:Patterns by agent type
Customer support
lookup_ticket for any ticket question. Only escalate_to_human after checking the ticket. Don’t promise outcomes you can’t verify.”
Booking / reservations
check_availability. If open, offer it and book. If not, offer the next two times or the waitlist.”
The next tool depends on the prior result. Don’t expose both create_reservation and add_to_waitlist simultaneously. The model picks the wrong one ~30% of the time.
Banking / account
verify_identity. Never quote a balance or transaction you haven’t fetched. Never promise a dispute outcome; only the system can.”
The anti-fabrication clause matters most here. A bank agent inventing a balance is a P0.
Debugging
Tool never fires
- Description too vague. Name the user phrases that should trigger it.
- System prompt missing “default to calling” wording.
- Too many tools (>10). Drop or split via progressive reveal.
- Add a few-shot example to the system prompt. This is the strongest signal.
Wrong arguments
- Parameter description missing format/example. Add
(e.g. 2026-04-30), or setexamples/pattern. See Parameter hints. - Free-text where you want fixed buckets. Use
enum. - User says the value multiple ways. Normalise in the description.
Agent invents a result
Most common cause: the model is being asked to do something after a tool result without having actually called the tool. Two fixes, used together:- Progressive reveal: gate the commit tool behind the read tool.
- Anti-fabrication clause in the prompt (see above).
Tool fires repeatedly
tool.resultarriving whilelast_eventisreply.started. Make sure your handler flushes onreply.done.- Tool slower than
timeout_seconds. Agent gets internal timeout, user tries again. Bump the timeout.