The Claude API returns standard HTTP status codes. Some errors are transient and should be retried with backoff. Others indicate a permanent problem in your request that will fail on every retry. Mixing these up is one of the most common โ and costly โ mistakes in Claude API integration.
This reference covers every error code you'll encounter in production, with the cause, the correct handling strategy, and working Python code for production-grade retry logic. See also the Claude API Enterprise Guide and our breakdown of Claude rate limiting and scaling strategies.
Authentication & Authorisation Errors
Missing, invalid, or revoked API key. The x-api-key header is absent, malformed, or the key has been deleted from your Anthropic console.
Verify the API key exists in your Anthropic console. Confirm it's being passed in the correct header. Ensure no whitespace or newline characters are included in the key string. Rotate if compromised.
The API key is valid, but doesn't have permission to access the requested resource. Common when using a workspace-scoped key that lacks access to a specific model, or when accessing beta features not enabled for your account tier.
Check the permissions assigned to your API key in the Anthropic console. For beta features, ensure you've opted in via the correct beta header and that your account tier supports the feature. Contact Anthropic support if permissions appear correct but the error persists.
Request & Validation Errors
The request is malformed. Common causes: invalid JSON, missing required fields (model, messages, max_tokens), incorrect message role sequence, invalid tool definition schema, or a parameter value outside the allowed range.
Read the error message โ it almost always tells you exactly which field is wrong. Validate your JSON. Check that messages alternate human/assistant correctly. Confirm max_tokens doesn't exceed the model's limit. Review tool definitions against the tool use schema.
The requested resource doesn't exist. Most often caused by an invalid model identifier string โ e.g., passing claude-sonnet instead of claude-sonnet-4-6. Also occurs with incorrect API endpoint paths.
Verify the exact model string from Anthropic's model documentation. Model IDs include the full version string. Store model names as constants and validate them at startup, not at runtime.
# Correct model identifiers (March 2026)
MODELS = {
"opus": "claude-opus-4-6",
"sonnet": "claude-sonnet-4-6",
"haiku": "claude-haiku-4-5-20251001"
}
# Validate at startup
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
Rate Limit Errors
You've exceeded your account's rate limit โ either requests per minute (RPM), tokens per minute (TPM), or tokens per day (TPD). The response headers include retry-after indicating when you can try again. Rate limits vary by model and account tier.
Implement exponential backoff with jitter. Respect the retry-after header. For sustained high-volume workloads, use the Batch API (50% cheaper, no rate limits on throughput). Contact Anthropic to increase your rate limit tier if you consistently hit limits.
import anthropic
import time
import random
def call_claude_with_retry(client, max_retries=5, **kwargs):
"""Production retry logic for Claude API calls."""
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except anthropic.RateLimitError as e:
if attempt == max_retries - 1:
raise
# Respect retry-after header if present
retry_after = getattr(e, 'retry_after', None)
if retry_after:
wait_time = retry_after
else:
# Exponential backoff with jitter
wait_time = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Waiting {wait_time:.1f}s (attempt {attempt+1}/{max_retries})")
time.sleep(wait_time)
except anthropic.APIStatusError as e:
if e.status_code >= 500:
# Server errors: retry with backoff
if attempt == max_retries - 1:
raise
wait_time = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait_time)
else:
# Client errors (400, 401, 403, 404): don't retry
raise
Server Errors
An internal error on Anthropic's side. Not caused by your request. Typically transient โ often resolves within seconds. Can occur during infrastructure events or rare model processing failures.
Retry with exponential backoff. After 3 retries, log the error and alert if the pattern persists. Check the Anthropic status page (status.anthropic.com) if you see sustained 500s. Never surface raw 500 errors to end users.
Anthropic's API is temporarily overloaded. Not a rate limit on your account โ it's a capacity constraint on the platform side. Occurs during peak demand periods or after a major feature launch when usage spikes.
Use longer backoff intervals than for standard 5xx errors. Consider switching to a less-loaded model tier (e.g., Haiku during Sonnet overload periods) if your use case allows. Queue requests and process them over a longer window rather than hammering the API.
Content & Context Errors
The total tokens in your request (system prompt + messages + tools + max_tokens reserved for output) exceed the model's context window. Each model has a different context limit. Sonnet and Haiku have a 200K token context window; Opus has a 200K context window as well.
Implement context window management: truncate or summarise earlier messages, chunk large documents, use RAG to retrieve only relevant passages rather than passing full documents, or use prompt caching for the static portions of your context.
Not technically an error โ Claude's response was cut off because it reached the max_tokens limit. The response is valid but incomplete. Check response.stop_reason on every API call.
If truncation is unacceptable for your use case, increase max_tokens and/or implement continuation logic: detect stop_reason == "max_tokens", then send a follow-up request asking Claude to continue from where it left off.
Building Production Claude API Applications?
Our architects have designed Claude API integrations handling millions of requests per month. We handle error handling, retry logic, rate limit management, and cost optimisation.
Book a Free Architecture Review โError Monitoring in Production
Logging individual errors isn't enough. Production Claude API deployments need error rate monitoring with alerting thresholds. A sudden spike in 5xx errors may indicate an Anthropic outage. A rising baseline of 400 errors may indicate a schema change breaking your request format. A 401 spike may indicate API key rotation that didn't propagate correctly.
Track these metrics as time-series: 429 error rate (rate limit pressure), 5xx error rate (infrastructure health), 400 error rate by error type (request quality), and average response latency with p95 and p99 percentiles. Export them to your existing observability stack โ Datadog, Grafana, CloudWatch โ using the same patterns as your other API integrations.
import anthropic
from dataclasses import dataclass
from typing import Optional
import logging
logger = logging.getLogger(__name__)
@dataclass
class APICallResult:
success: bool
response: Optional[anthropic.types.Message]
error_type: Optional[str]
error_code: Optional[int]
attempt_count: int
total_latency_ms: float
def tracked_claude_call(client, **kwargs) -> APICallResult:
"""Wrapper that records metrics for every Claude API call."""
import time
start = time.monotonic()
attempts = 0
try:
response = call_claude_with_retry(client, **kwargs)
latency = (time.monotonic() - start) * 1000
# Emit to your metrics system here
return APICallResult(True, response, None, None, attempts, latency)
except anthropic.APIStatusError as e:
latency = (time.monotonic() - start) * 1000
logger.error(f"Claude API error: {e.status_code} {e.message}")
# Alert if error_rate > threshold
return APICallResult(False, None, type(e).__name__, e.status_code, attempts, latency)
Quick Reference: Error Code Cheatsheet
Print this and put it next to your screen during integration work.
Never retry: 400 (bad request), 401 (auth failed), 403 (forbidden), 404 (not found)
Handle conditionally: Truncated responses (stop_reason: max_tokens), network timeouts