Browser integration
Connect browser-based apps to the Voice Agent API using a temporary token.
Connect a browser to the Voice Agent API in two steps:
- Your server calls
GET /v1/tokenwith your API key to mint a short-lived temporary token. - Your browser opens the WebSocket with
?token=<token>— no API key exposed.
Your API key never leaves your server. Each token is single-use — it starts exactly one session, and all usage is attributed to the key that generated it.
Browsers provide built-in acoustic echo cancellation through getUserMedia, so browser-based clients work hands-free without headphones. If you’re developing on a laptop, the browser integration is the recommended starting point.
1. Generate a token on your server
Call GET /v1/token with your API key in the Authorization header. Pick an expires_in_seconds short enough to limit replay risk (60–300s is a good default) and an optional max_session_duration_seconds to cap the session length.
expires_in_seconds must be between 1 and 600. max_session_duration_seconds must be between 60 and 10800 (defaults to 10800, the 3-hour maximum session duration).
Token expiry and failure modes
If a token is missing, expired, or invalid, the server rejects the handshake with an UNAUTHORIZED error (close code 1008). In browsers, this may surface as a close event with code 1006 and no body — you won’t receive a session.error event. Always fetch a fresh token immediately before each connection attempt.
If the WebSocket drops mid-session and you need to reconnect with session.resume, you’ll need a new token for the new WebSocket — the original token can’t be reused.
2. Connect from the browser with the token
Fetch the token from your server, then open the WebSocket with ?token=<token>. No Authorization header is needed.
Fetch a fresh token for every new WebSocket connection. Tokens are single-use — a dropped connection needs a new token to reconnect (including when using session.resume).
3. Browser quickstart
A complete working example that captures microphone audio, streams it to the Voice Agent API, and plays back the agent’s response. This requires two files — an HTML page and an AudioWorklet processor.
AudioWorklet processors must be loaded from a URL (audioContext.audioWorklet.addModule(url)), so you need at least two files. This example won’t work in a single-file environment like CodePen or JSFiddle without modifications. Use a local server (npx serve .) or a framework with static file support.
Create pcm-processor.js in the same directory as your HTML file:
Then create your HTML file:
The key line is new AudioContext({ sampleRate: 24000 }). Browsers default to the device sample rate (usually 48 kHz), so without this you’d need to manually resample both mic input and playback output. Forcing 24 kHz on the context avoids this entirely.