Free tool · No signup

LLM Cost Calculator

Compare monthly inference cost across Claude Sonnet/Opus/Haiku, GPT-4o/Turbo, and Gemini 2.5 Pro/Flash. Mid-2026 list prices with prompt caching savings included. Cost is only half the call — see our Claude vs GPT-4 for enterprise comparison for the quality and governance trade-offs.

Input tokens / month (millions)

e.g., 50 = 50M input tokens / month

Output tokens / month (millions)

Typically 1/3 to 1/5 of input

Prompt caching

Anthropic + Google only

Monthly cost by model (sorted cheapest first)

Gemini 2.5 Flash

Google

$45/mo

$536/year

Claude Haiku 4.5

Anthropic

$75/mo

$898/year

Gemini 2.5 Pro

Google

$105/mo

$1,256/year

Claude Sonnet 4.6

Anthropic

$281/mo

$3,366/year

GPT-4o

OpenAI

$388/mo

$4,650/year

GPT-4 Turbo

OpenAI · caching n/a

$950/mo

$11,400/year

Claude Opus 4.7

Anthropic

$1,403/mo

$16,830/year

Mid-2026 list prices. Volume discounts and enterprise agreements not reflected. Cached input pricing assumes 70% cache hit rate.

Get a fixed-price scope →

Quick facts

Prompt caching savings: Anthropic 10×, Google 4× on cached inputs (70% typical cache hit).
Cheapest per million input tokens (mid-2026): Gemini 2.5 Flash ($0.30), Claude Haiku ($0.80).
Best quality / cost ratio for prod workflows: Claude Sonnet 4.6 with prompt caching.
Enterprise discounts: Most providers offer 20–40% reductions at > $50k/month commit.

Continue with

We help mid-market teams pick the right LLM stack

Model selection isn't a leaderboard question — it's an accuracy-cost-latency-trust optimisation against your specific labelled test set.

Talk through your LLM stack →