Free tool · No signup
LLM Cost Calculator
Compare monthly inference cost across Claude Sonnet/Opus/Haiku, GPT-4o/Turbo, and Gemini 2.5 Pro/Flash. Mid-2026 list prices with prompt caching savings included.
e.g., 50 = 50M input tokens / month
Typically 1/3 to 1/5 of input
Anthropic + Google only
Monthly cost by model (sorted cheapest first)
Gemini 2.5 Flash
$45/mo
$536/year
Claude Haiku 4.5
Anthropic
$75/mo
$898/year
Gemini 2.5 Pro
$105/mo
$1,256/year
Claude Sonnet 4.6
Anthropic
$281/mo
$3,366/year
GPT-4o
OpenAI
$388/mo
$4,650/year
GPT-4 Turbo
OpenAI · caching n/a
$950/mo
$11,400/year
Claude Opus 4.7
Anthropic
$1,403/mo
$16,830/year
Mid-2026 list prices. Volume discounts and enterprise agreements not reflected. Cached input pricing assumes 70% cache hit rate.
Quick facts
- Prompt caching savings: Anthropic 10×, Google 4× on cached inputs (70% typical cache hit).
- Cheapest per million input tokens (mid-2026): Gemini 2.5 Flash ($0.30), Claude Haiku ($0.80).
- Best quality / cost ratio for prod workflows: Claude Sonnet 4.6 with prompt caching.
- Enterprise discounts: Most providers offer 20–40% reductions at > $50k/month commit.
We help mid-market teams pick the right LLM stack
Model selection isn't a leaderboard question — it's an accuracy-cost-latency-trust optimisation against your specific labelled test set.