> ## Documentation Index
> Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Connect to Twilio

> Bridge Twilio phone calls to the Voice Agent API with zero transcoding.

Connect [Twilio Programmable Voice](https://www.twilio.com/docs/voice) to the Voice Agent API so callers can have real-time conversations with your agent over the phone. Twilio handles the phone network, your server bridges audio between Twilio Media Streams and the Voice Agent API, and the agent handles speech-to-speech.

Because Twilio's native G.711 μ-law format is byte-compatible with the Voice Agent API's `audio/pcmu` encoding, the server forwards audio as-is with **zero transcoding**.

```
Caller ↔ Twilio Media Streams ↔ Your server ↔ Voice Agent API
```

## Before you begin

To complete this guide, you need:

* An [AssemblyAI API key](https://www.assemblyai.com/dashboard/home) with Voice Agent access.
* A [Twilio account](https://www.twilio.com/try-twilio) with a phone number. You can [buy one](https://www.twilio.com/console/phone-numbers/search) if you don't have one.
* [Node.js 20+](https://nodejs.org/).
* [ngrok](https://ngrok.com/docs/getting-started/) for exposing your local server to the internet.

## Quickstart

Clone the example repo and get a working Twilio voice agent in minutes.

<Steps>
  <Step title="Clone the repo and install dependencies">
    ```bash theme={null}
    git clone https://github.com/AssemblyAI/voice-agent-api-twilio-example.git
    cd voice-agent-api-twilio-example
    npm install
    ```
  </Step>

  <Step title="Start an ngrok tunnel">
    Twilio needs a public URL to reach your local server. In a separate terminal, start ngrok:

    ```bash theme={null}
    ngrok http 3000
    ```

    Copy the `https://...ngrok.app` URL from the output.
  </Step>

  <Step title="Configure environment variables">
    ```bash theme={null}
    cp .env.example .env
    ```

    Open `.env` and fill in your keys:

    ```bash theme={null}
    ASSEMBLYAI_API_KEY=YOUR_API_KEY
    HOSTNAME=https://your-ngrok-domain.ngrok.app
    ```
  </Step>

  <Step title="Run the server">
    ```bash theme={null}
    npm run dev
    ```

    You should see `Server running on http://localhost:3000`.
  </Step>

  <Step title="Point Twilio at your server">
    In the [Twilio Console](https://console.twilio.com/), open your phone number's **Voice configuration** and set:

    * **A call comes in** → Webhook → `POST` → `https://<your-ngrok-domain>/twiml`
    * **Call status changes** (optional) → Webhook → `POST` → `https://<your-ngrok-domain>/call-status`
  </Step>

  <Step title="Call your number">
    Dial your Twilio number from any phone. You should hear the agent's greeting, then have a real-time conversation. Watch the server logs to see the event stream.
  </Step>
</Steps>

## How it works

When a call comes in, the following sequence happens:

1. A caller dials your Twilio number.
2. Twilio sends a webhook to `POST /twiml` on your server. The server returns TwiML containing a `<Stream>` element pointed at your WebSocket endpoint.
3. Twilio opens a Media Streams WebSocket and starts sending the caller's audio (G.711 μ-law, 8 kHz).
4. Your server opens a parallel WebSocket to the Voice Agent API and sends a `session.update` with the system prompt, voice, greeting, tools, and audio format set to `audio/pcmu`.
5. Once `session.ready` fires, the server forwards audio in both directions:
   * **Caller → Agent**: Each Twilio `media` event becomes an `input.audio` event.
   * **Agent → Caller**: Each `reply.audio` event becomes a Twilio `media` action.
6. When the caller barges in (`input.speech.started`), the server sends a Twilio `clear` action so the agent stops talking immediately.

### Return TwiML with a stream

When Twilio receives a call, it hits your `/twiml` endpoint. The server responds with TwiML that opens a Media Streams WebSocket:

```typescript theme={null}
app.post("/twiml", (req, res) => {
  const callId = newCallId();
  const hostname = process.env.HOSTNAME.replace(/^https?:\/\//, "");
  const streamUrl = `wss://${hostname}/media-stream/${callId}`;

  res.type("text/xml").status(200).send(
    `<Response>
  <Connect>
    <Stream url="${streamUrl}" />
  </Connect>
</Response>`,
  );
});
```

### Connect to the Voice Agent API

When Twilio opens the Media Streams WebSocket, the server creates a parallel connection to the Voice Agent API and sends the session configuration:

```typescript theme={null}
const aaiWs = new WebSocket("wss://agents.assemblyai.com/v1/realtime", {
  headers: { Authorization: `Bearer ${process.env.ASSEMBLYAI_API_KEY}` },
});

aaiWs.send(JSON.stringify({
  type: "session.update",
  session: {
    system_prompt: "You are a helpful voice assistant.",
    greeting: "Hi, thanks for calling. How can I help?",
    input: { type: "audio", format: { encoding: "audio/pcmu" } },
    output: {
      type: "audio",
      voice: "ivy",
      format: { encoding: "audio/pcmu" },
    },
    tools: [/* your tool definitions */],
  },
}));
```

<Note>
  Both `input` and `output` use `audio/pcmu` (G.711 μ-law at 8 kHz) to match Twilio's native codec. This means no transcoding or resampling is needed.
</Note>

### Bridge audio between Twilio and the Voice Agent API

Once `session.ready` fires, forward audio payloads in both directions:

```typescript expandable theme={null}
// Twilio → Voice Agent API
tw.on("media", (msg) => {
  if (msg.media.track !== "inbound") return;
  aaiWs.send(JSON.stringify({
    type: "input.audio",
    audio: msg.media.payload,
  }));
});

// Voice Agent API → Twilio
aaiWs.on("message", (data) => {
  const event = JSON.parse(data.toString());

  if (event.type === "reply.audio" && event.data) {
    tw.send({
      event: "media",
      streamSid: tw.streamSid,
      media: { payload: event.data },
    });
  }
});
```

### Handle barge-in

When the caller starts speaking while the agent is talking, clear the Twilio audio buffer so the agent stops immediately:

```typescript theme={null}
if (event.type === "input.speech.started") {
  tw.send({ event: "clear", streamSid: tw.streamSid });
}
```

## Make outbound calls

The example repo also supports outbound calling. Set the Twilio credentials in `.env`:

```bash theme={null}
TWILIO_ACCOUNT_SID=YOUR_TWILIO_ACCOUNT_SID
TWILIO_AUTH_TOKEN=YOUR_TWILIO_AUTH_TOKEN
TWILIO_PHONE_NUMBER=+15551234567
TARGET_PHONE_NUMBER=+15557654321
```

With the server still running, open a new terminal and run:

```bash theme={null}
npm run outbound
```

This places a call from your Twilio number to the target. Twilio fetches `/outbound-twiml`, which connects the call to `/outbound-stream`. The agent speaks first using the configured `greeting`.

## Add custom tools

The example includes one tool, `generate_random_number`. To add your own tools:

1. Define the tool in the `TOOLS` array in `src/bot.ts`:

```typescript theme={null}
export const TOOLS = [
  {
    type: "function",
    name: "generate_random_number",
    description: "Generate a random integer between min and max (inclusive).",
    parameters: {
      type: "object",
      properties: {
        min: { type: "number", description: "Minimum value (inclusive)." },
        max: { type: "number", description: "Maximum value (inclusive)." },
      },
      required: ["min", "max"],
    },
  },
];
```

2. Add the handler in `runTool`:

```typescript theme={null}
export async function runTool(
  name: string,
  args: Record<string, any>,
): Promise<string> {
  switch (name) {
    case "generate_random_number": {
      const min = Math.ceil(args.min);
      const max = Math.floor(args.max);
      const result = Math.floor(Math.random() * (max - min + 1)) + min;
      return JSON.stringify({ result, min: args.min, max: args.max });
    }
    default:
      return JSON.stringify({ error: `Unknown tool: ${name}` });
  }
}
```

3. When the agent calls a tool, the Voice Agent API sends a `tool.call` event. The server runs the tool and sends back a `tool.result` event with the same `call_id`. The agent then continues the conversation naturally.

For more on tool calling, see [Add tools to your agent](/voice-agents/voice-agent-api/tools/overview).

## Troubleshooting

* **Call connects but no audio**: Check that `HOSTNAME` matches your ngrok domain and that your server is reachable. Watch ngrok's request log for the incoming Media Streams WebSocket.
* **`session.error` with `invalid_value` on the `voice` field**: Voice names are case-sensitive. Use lowercase (`ivy`, `claire`, `dawn`, etc.). See [Choose a voice](/voice-agents/voice-agent-api/voices) for available voices.
* **Greeting plays but later replies don't**: Make sure your tool handler always sends a `tool.result` back. The agent waits for it before continuing.
* **Audio is choppy or echoey**: Twilio handles echo cancellation on the carrier side. If you hear echo during testing, it's likely your speakerphone. Use a headset.

## Next steps

* [Configure your agent](/voice-agents/voice-agent-api/session-configuration): Customize the system prompt, greeting, and turn detection.
* [Choose a voice](/voice-agents/voice-agent-api/voices): Pick a voice for your agent.
* [Add tools to your agent](/voice-agents/voice-agent-api/tools/overview): Give your agent the ability to call functions.
* [Audio format](/voice-agents/voice-agent-api/audio-format): Learn more about supported encodings.
* [Twilio Media Streams](https://www.twilio.com/docs/voice/media-streams): Twilio's documentation on Media Streams.
