Connect your own LLM

By default, a voice agent uses AssemblyAI’s managed conversational model — you don’t configure anything. To run the agent on a different model, set the llm field on the agent to your own OpenAI-compatible chat-completions endpoint. AssemblyAI calls that endpoint at runtime to generate every reply.

When to use this. Reach for a custom LLM when you need a specific model, your own fine-tune, or your own provider account and billing. If you just want a different frontier model without managing an endpoint, point llm at the LLM Gateway instead.

Connect a model

Add an llm array to a create or update request. Each entry needs a base_url, a model, and an api_key:

curl -X POST https://agents.assemblyai.com/v1/agents \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Assistant",
    "system_prompt": "You are a friendly support agent. Keep replies under two sentences.",
    "voice": { "voice_id": "ivy" },
    "llm": [
      {
        "base_url": "https://api.openai.com/v1",
        "model": "gpt-4o-mini",
        "api_key": "sk-..."
      }
    ]
  }'

Field	Type	Required	Notes
`base_url`	string	Yes	HTTPS base URL of the OpenAI-compatible endpoint. Must be `https` and a public host. The agent calls `POST {base_url}/chat/completions`.
`model`	string	Yes	Model name sent in the chat-completions request body.
`api_key`	string	Yes	Key for your endpoint. Write-only — encrypted at rest and never returned in any response.

Update or rotate the model

Send a new llm array on PUT /v1/agents/{id}. Include api_key to rotate the key; the whole llm entry is replaced:

curl -X PUT https://agents.assemblyai.com/v1/agents/$AGENT_ID \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "llm": [
      { "base_url": "https://api.openai.com/v1", "model": "gpt-4.1", "api_key": "sk-new..." }
    ]
  }'

To switch the agent back to the managed model, send "llm": [].

Use the LLM Gateway

You don’t need your own provider account to use a frontier model. Point base_url at AssemblyAI’s LLM Gateway and pass your AssemblyAI API key — you get Claude, GPT, Gemini, and more through one endpoint, billed on your AssemblyAI account:

curl -X POST https://agents.assemblyai.com/v1/agents \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Gateway Assistant",
    "system_prompt": "You are a friendly support agent. Keep replies under two sentences.",
    "voice": { "voice_id": "ivy" },
    "llm": [
      {
        "base_url": "https://llm-gateway.assemblyai.com/v1",
        "model": "claude-sonnet-4-6",
        "api_key": "'"$ASSEMBLYAI_API_KEY"'"
      }
    ]
  }'

See Available models for the full list. Use the EU host https://llm-gateway.eu.assemblyai.com/v1 for EU workloads.

Requirements & behavior

OpenAI-compatible. The endpoint must accept POST /chat/completions in the OpenAI schema.
Streaming. Realtime voice needs token streaming, so the model must support streamed chat completions.
One config. llm is a list, but only a single entry is accepted today (fallbacks aren’t supported yet).
HTTPS + public host. Non-https URLs and private/loopback hosts are rejected.
Reads mask the key. GET/list responses return only base_url and model — never api_key.

Latency and reliability now depend on your endpoint. A slow or rate-limited model shows up directly as reply latency in the conversation. See Best practices for tuning.

​Connect a model

​Update or rotate the model

​Use the LLM Gateway

​Requirements & behavior

Connect a model

Update or rotate the model

Use the LLM Gateway

Requirements & behavior