Skip to main content
The encoding determines the sample rate and bit depth of the audio going in (microphone) and coming out (agent speech). Set it on the agent under input.format and output.format when you create or update it, or inline via session.update. Most agents can leave this alone — the defaults are the highest quality. Change it mainly for telephony. For how to actually stream and play the audio bytes, see Stream audio. Input and output encodings are independent and can differ. Both default to audio/pcm (24 kHz) if omitted.
EncodingSample rateBest for
audio/pcm24,000 HzDefault. Highest quality, ideal for browser and app use.
audio/pcmu8,000 HzTelephony (G.711 μ-law).
audio/pcma8,000 HzTelephony (G.711 A-law).
For telephony, use audio/pcmu or audio/pcma (8 kHz) to match the phone network and avoid resampling. See Connect to Twilio for a full phone integration. Set format.encoding under input and output. You can also pass an explicit sample_rate inside format:
curl -X POST https://agents.assemblyai.com/v1/agents \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Assistant",
    "system_prompt": "You are a friendly support agent. Keep replies under two sentences.",
    "voice": { "voice_id": "ivy" },
    "input":  { "format": { "encoding": "audio/pcmu", "sample_rate": 8000 } },
    "output": { "format": { "encoding": "audio/pcmu", "sample_rate": 8000 } }
  }'
FieldTypeRequiredNotes
input.format.encodingstringNoaudio/pcm, audio/pcmu, or audio/pcma. Default audio/pcm.
output.format.encodingstringNoSame values as input. Default audio/pcm.
format.sample_rateintegerNoSample rate in Hz. Determined by the encoding if omitted.
Set the volume separately — see Volume. When configured inline via session.update, output.format is immutable after session.ready and must be set on your first update.