Audio & volume - AssemblyAI

A voice agent’s audio configuration covers the input encoding (microphone), the output encoding (agent speech), and the playback volume. These fields live on the agent under input.format, output.format, and output.volume, set when you create or update it, or inline over the WebSocket via session.update. This page covers how to configure the encoding and volume. For how to actually stream and play the audio bytes, see Stream audio.

Encoding

The encoding determines the sample rate and bit depth. Input and output encodings are independent and can differ. Both default to audio/pcm (24 kHz) if omitted.

Encoding	Sample rate	Best for
`audio/pcm`	24,000 Hz	Default. Highest quality, ideal for browser and app use.
`audio/pcmu`	8,000 Hz	Telephony (G.711 μ-law).
`audio/pcma`	8,000 Hz	Telephony (G.711 A-law).

For telephony, use audio/pcmu or audio/pcma (8 kHz) to match the phone network and avoid resampling. See Connect to Twilio for a full phone integration. Set format.encoding under input and output. You can also pass an explicit sample_rate inside format:

curl -X POST https://agents.assemblyai.com/v1/agents \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Assistant",
    "system_prompt": "You are a friendly support agent. Keep replies under two sentences.",
    "voice": { "voice_id": "ivy" },
    "input":  { "format": { "encoding": "audio/pcmu", "sample_rate": 8000 } },
    "output": { "format": { "encoding": "audio/pcmu", "sample_rate": 8000 } }
  }'

Field	Type	Required	Notes
`input.format.encoding`	string	No	`audio/pcm`, `audio/pcmu`, or `audio/pcma`. Default `audio/pcm`.
`output.format.encoding`	string	No	Same values as input. Default `audio/pcm`.
`format.sample_rate`	integer	No	Sample rate in Hz. Determined by the encoding if omitted.

Volume

Adjust the playback volume of the agent’s speech via output.volume. Accepts a number from 0 (silent) to 100 (loudest). If omitted, the voice plays at its native level.

curl -X PUT https://agents.assemblyai.com/v1/agents/$AGENT_ID \
  -H "Authorization: $ASSEMBLYAI_API_KEY" -H "Content-Type: application/json" \
  -d '{ "output": { "volume": 60 } }'

Field	Type	Required	Notes
`output.volume`	number \| null	No	`0` (silent) to `100` (loudest). `null` plays at native level.

When configured inline via session.update, output.voice and output.format are immutable after session.ready and must be set on your first update. output.volume is the exception: it can be changed mid-session, and the new value applies to subsequent reply.audio chunks.

​Encoding

​Volume

Encoding

Volume