Voice Agent API

Connect to the Voice Agent API to run a real-time voice conversation. The client streams PCM16 audio to the server and receives the agent’s spoken response (also PCM16), along with transcripts, tool calls, and lifecycle events.

See the Voice Agent API overview for the full event flow and a runnable quickstart.

Handshake

WSS
wss://agents.assemblyai.com/v1/voice

Headers

AuthorizationstringRequired
Pass your API key as a Bearer token in the `Authorization` header on the WebSocket upgrade request. For browser apps (which can't set custom headers on WebSockets), generate a [temporary token](/docs/api-reference/voice-agent-api/voice-agent-web-socket/generate-voice-agent-token) and pass it via the `token` query parameter instead. See [Browser integration](/docs/voice-agents/voice-agent-api/browser-integration).

Query parameters

tokenstringRequired

Temporary authentication token for client-side connections. Generate one with GET /v1/token on your server and pass it here so you don’t expose your permanent API key in the browser. Each token is one-time use.

Send

sendSessionUpdateobjectRequired

Configure the session. Send immediately on connect — before session.ready — to set the system prompt, greeting, voice, tools, and turn detection behavior. Can also be sent mid-conversation to update any of these fields.

OR
sendSessionResumeobjectRequired

Resume a previous session using the session_id from a prior session.ready. Preserves conversation context across dropped connections. Sessions are held for 30 seconds after every disconnection.

OR
sendInputAudioobjectRequired

Stream a chunk of user audio to the agent. Only send input.audio after session.ready. See Audio format for the expected encoding (PCM16 mono 24kHz, base64).

OR
sendToolResultobjectRequired

Return a tool result to the agent. Send this inside your reply.done handler — not immediately on tool.call. See Tool calling.

Receive

receiveSessionReadyobjectRequired

Session is established. Save session_id for reconnection and start streaming audio.

OR
receiveSessionUpdatedobjectRequired

Sent after a session.update is applied successfully.

OR
receiveSessionErrorobjectRequired

A session- or protocol-level error occurred.

OR
receiveInputSpeechStartedobjectRequired
Turn detection determined the user has started speaking.
OR
receiveInputSpeechStoppedobjectRequired
Turn detection determined the user has stopped speaking.
OR
receiveTranscriptUserDeltaobjectRequired

Partial transcript of the user’s utterance, updating in real-time.

OR
receiveTranscriptUserobjectRequired
Final transcript of the user's utterance.
OR
receiveReplyStartedobjectRequired
Agent has begun generating a response.
OR
receiveReplyAudioobjectRequired

A chunk of the agent’s spoken response (base64 PCM16). Decode and play immediately. See Audio format for playback guidance.

OR
receiveTranscriptAgentobjectRequired
Full text of the agent's response, delivered after all audio for the reply has been sent.
OR
receiveReplyDoneobjectRequired

Agent has finished speaking. Send any accumulated tool.result events here.

OR
receiveToolCallobjectRequired

Agent wants to invoke a registered tool. Execute the tool, then send the result with tool.result after reply.done fires.