Troubleshooting

Common issues and fixes when using the LLM Gateway.

What to log for support

Every LLM Gateway response includes a request_id — a unique identifier for that specific request. Log this ID for every call, not just when something goes wrong. When you reach out to support@assemblyai.com, including the request_id lets us find the exact request in our logs in seconds.

At minimum, capture the following for every request:

  • request_id from the response body
  • The model parameter used
  • The API region (US: llm-gateway.assemblyai.com, EU: llm-gateway.eu.assemblyai.com)
  • A timestamp for when the request was sent
  • The full HTTP status code and response body when a non-2xx response is returned

A minimal logging example:

1import requests
2import time
3
4response = requests.post(
5 "https://llm-gateway.assemblyai.com/v1/chat/completions",
6 headers={"authorization": "<YOUR_API_KEY>"},
7 json={
8 "model": "claude-sonnet-4-6",
9 "messages": [{"role": "user", "content": "What is the capital of France?"}],
10 "max_tokens": 1000,
11 },
12)
13
14result = response.json()
15log_entry = {
16 "timestamp": time.time(),
17 "region": "us",
18 "model": "claude-sonnet-4-6",
19 "status_code": response.status_code,
20 "request_id": result.get("request_id"),
21 "error": result.get("error"),
22}
23print(log_entry)

Authentication errors (401 / 403)

Symptom: The API responds with 401 Unauthorized or 403 Forbidden.

1{
2 "error": {
3 "code": 401,
4 "message": "Unauthorized - Invalid or missing API key"
5 }
6}

Causes:

  • API key is missing, malformed, or expired.
  • API key is from a different account or region.
  • The Authorization header is misspelled (e.g. Authorisation or missing the header entirely).

Fixes:

  • Confirm your API key on the API Keys page.
  • Pass the key in the Authorization header — not as a query parameter and not prefixed with Bearer.
  • If you’re using EU data residency, make sure the key was generated for the EU region. See Cloud endpoints and data residency.

Bad request (400)

Symptom: The API responds with 400 Bad Request.

1{
2 "error": {
3 "code": 400,
4 "message": "Invalid request: missing required field 'model'"
5 }
6}

Causes:

  • A required field is missing (model, plus either messages or prompt).
  • The model value is not a supported model parameter — see Available models.
  • max_tokens is outside the valid range or exceeds the model’s context window.
  • A field is the wrong type (e.g. messages sent as a string instead of an array).

Fixes:

  • Validate your request payload against the Basic chat completions reference.
  • Echo the full error message — it includes the specific field that failed validation.

Rate limit exceeded (429)

Symptom: The API responds with 429 Too Many Requests.

Cause: You exceeded the per-model rate limit within a 60-second window. Each model has its own limit.

Fixes:

  • Read the rate limit headers on every response (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) to back off gracefully. See Rate limits for the full header reference.
  • Implement exponential backoff with jitter when you receive a 429.
  • Consider specifying fallback models so traffic spills over to a different model when the primary is rate-limited.
  • If you need a higher rate limit, contact support.

Model not found (404)

Symptom: The API responds with 404 Not Found and an error mentioning the model.

Causes:

  • The model value is misspelled or has been deprecated.
  • The model isn’t available in the region you’re calling. For example, OpenAI models are only available in the US region — see Cloud endpoints and data residency.

Fixes:

  • Double-check the exact model parameter against Available models.
  • If you need EU data residency, switch to an EU-supported model (most Anthropic Claude and Google Gemini models).

Server errors (5xx)

Symptom: The API responds with 500, 502, 503, or 504.

Causes:

  • Transient issues on AssemblyAI’s side or with the upstream model provider.
  • The upstream provider returned a timeout or unavailable response.

Fixes:

  • Retry with exponential backoff and jitter. Most 5xx errors are transient.
  • Check the AssemblyAI Status page for ongoing incidents.
  • If the error persists, contact support with the request_id, the model used, the timestamp, and the full error response body.

Streamed responses don’t appear

Symptom: You set stream: true but receive a single non-streamed response — or no response at all.

Causes:

  • Streaming is currently supported on OpenAI models only. Other providers ignore the stream flag and return a regular response.
  • The HTTP client isn’t reading the response body as a stream of server-sent events (SSE).

Fixes:


Unexpected output or quality issues

Symptom: The model returns content you didn’t expect — wrong format, wrong language, hallucinations, or refusals.

Fixes:

  • Capture the full request payload (model, messages, parameters), the full response, and the request_id. Send all three to support@assemblyai.com — quality issues are difficult to diagnose without the exact prompt.
  • For structured output, use Structured outputs with a JSON schema rather than prompting for JSON in free text.
  • For malformed JSON, enable Post-processing to automatically repair responses.
  • Try a different model — quality varies. See the LMArena scores for a comparison.

Contacting support

If you’ve worked through the steps above and still need help, email support@assemblyai.com with:

  • The request_id from the failing response (or several, for intermittent issues)
  • The model parameter used
  • The API region (US or EU)
  • A timestamp for when the request was sent
  • The HTTP status code and full error response body
  • A minimal reproducible example of the request payload (with your API key redacted)