Before you commit to a deployment, know what you're spending. This guide breaks down every Claude pricing variable โ model, token volume, caching, batch discounts, and subscription tiers โ with real numbers for real scenarios.
Adjust the sliders to estimate your monthly Claude API spend. Results update in real time. All figures based on Anthropic's published pricing as of March 2026.
All prices per million tokens (MTok) unless noted. Batch API pricing is 50% of standard. Prompt caching reduces input costs by up to 90%.
| Model | Type | Input (per MTok) | Output (per MTok) | Cache Write | Cache Read | Context Window |
|---|---|---|---|---|---|---|
| claude-opus-4-6 | Most Capable | $15.00 | $75.00 | $18.75 | $1.50 | 200K tokens |
| claude-sonnet-4-6 | Recommended | $3.00 | $15.00 | $3.75 | $0.30 | 200K tokens |
| claude-haiku-4-5 | Fastest | $0.80 | $4.00 | $1.00 | $0.08 | 200K tokens |
| claude-sonnet-4-6 Batch API |
Batch (50% off) | $1.50 | $7.50 | โ | โ | 200K tokens |
| claude-haiku-4-5 Batch API |
Batch (50% off) | $0.40 | $2.00 | โ | โ | 200K tokens |
Prompt Caching Note: When you write to cache, you pay 25% more than standard input. When you read from cache, you pay just 10% of standard input. For applications with large system prompts sent repeatedly, caching typically cuts input costs by 70โ90%. See our Claude Prompt Caching guide for implementation details.
Estimated monthly costs for common Claude deployment patterns. Assumes Claude Sonnet 4.6 standard pricing with 30% prompt cache hit rate unless noted.
A corporate knowledge base chatbot handling employee HR, IT, and policy questions. Medium traffic, moderate context.
Automated contract analysis: ingest PDFs, extract clauses, flag risks, summarise findings. High-volume overnight batch.
24/7 customer support agent handling product questions, returns, account queries. Requires fast response times โ Haiku model.
Automated PR review: read diffs, check style, identify bugs, suggest improvements. Engineers submit PRs to a GitHub Action that calls Claude.
Strategic research tool for consultants: deep reading of long documents, synthesis, competitive analysis. Requires Opus for reasoning quality.
Large-scale internal search and Q&A system serving 2,000 employees. Each query retrieves 5โ10 document chunks and calls Claude for synthesis.
API costs above apply to programmatic usage. These subscription plans cover claude.ai access for individual users, small teams, and enterprise seats.
Our clients consistently reduce initial API cost estimates by 40โ70% before going to production. Here's how.
Cache your system prompt and any large document context. A 2,000-token system prompt sent 10,000 times per month costs $60 uncached. With caching, reads drop to $0.60. See our Prompt Caching guide.
The Batch API processes requests asynchronously and costs 50% less. Any job that doesn't need an instant response โ document processing, report generation, analysis pipelines โ should use batch. Learn more in our Batch API guide.
Don't use Opus for tasks Haiku handles fine. A classification task that needs 50 tokens in and 10 tokens out costs $0.0000004 per call on Haiku vs $0.00078 on Opus. Model routing alone can cut bills by 60%. See our model comparison guide.
Every unnecessary word in your system prompt costs money โ thousands of times per day. Audit your prompts ruthlessly. Remove generic instructions. Use structured formats (XML, JSON) which are more token-efficient than prose instructions.
In RAG systems, most teams over-retrieve. Fetching 10 ร 500-token chunks when 3 ร 300-token chunks would suffice wastes 65% of input tokens. Tune your retrieval precision before scaling. Read our RAG architecture guide.
Set up monitoring and observability before you launch. Track token counts per endpoint, per user, per feature. You'll quickly identify which 20% of features consume 80% of tokens โ and whether they justify the cost.
Dev and staging environments often account for 30โ40% of API spend at early-stage companies. Set hard token budgets per environment. Use Haiku in dev/staging, Sonnet only in production. Treat AI API calls like database writes โ not free.
Claude's API pricing is token-based. You pay for what you send in (input tokens) and what Claude generates back (output tokens). Output tokens cost significantly more than input โ typically 4โ5ร โ because generation is more computationally intensive than reading context.
There are currently three Claude models with distinct price/performance tradeoffs. Haiku is the fastest and cheapest, designed for high-volume, low-latency tasks. Sonnet is the workhorse โ the best balance of intelligence and cost for most enterprise applications. Opus is the most powerful, reserved for tasks where reasoning quality matters more than speed or price.
Tokens aren't exactly words โ they're variable-length byte sequences. A rough heuristic: 750 words โ 1,000 tokens. A typical email is 200โ400 tokens. A long-form report might be 3,000โ8,000 tokens. A full legal contract could hit 20,000โ50,000 tokens.
The context window (200K tokens for all current Claude models) is the maximum combined input + output you can use in one API call. Long context calls are expensive โ a 150,000-token input on Opus costs $2.25 per call, so model selection matters enormously for long-context applications.
Anthropic's prompt caching lets you cache a portion of your context (minimum 1,024 tokens) and retrieve it at 90% lower cost. If you have a 5,000-token system prompt and send 50,000 requests per month, caching reduces that system prompt's monthly cost from roughly $750 to $76 on Sonnet.
Cache write is more expensive than standard input (125% of standard), but cache reads are just 10% of standard. Break-even occurs after about two reads of the same cached block. Any application with consistent system prompts or repeated document context should enable caching immediately.
The Batch API processes requests asynchronously with up to 24-hour turnaround, at 50% of standard API pricing. It's ideal for nightly document processing, report generation, data enrichment, and any workflow that can tolerate delay. A document processing pipeline that costs $5,000/month with synchronous calls costs $2,500/month with Batch.
Claude Enterprise is not priced by the token โ it's a seat-based subscription that includes unlimited claude.ai usage plus negotiated API rates. For organisations where employees need daily AI access plus programmatic API use, the per-seat model often beats pure pay-as-you-go at 50+ users.
Enterprise also includes SOC 2 compliance, HIPAA BAA availability, custom data retention, SSO/SCIM, and a dedicated customer success manager. For regulated industries, these aren't optional extras โ they're the cost of doing business. Factor them into your total cost of ownership analysis before comparing sticker prices.
When presenting Claude API costs to finance or procurement, don't just model the cost โ model the return. If Claude processes 10,000 contract reviews per month at $0.50 each ($5,000/month) and each review previously took a paralegal 45 minutes ($60/hour), the AI cost replaces $450,000/month of labour โ a 90ร return. Our ROI calculator walks through this framework in detail.
Our Claude API Integration team has designed architectures for everything from 100-call-per-day internal tools to 10M-call-per-month customer platforms. We'll model your costs before you write a line of code.