← Glossary/Architecture

Defined term

Multi-LLM architecture

Routing different tasks to different models based on cost, quality, latency, and capability tradeoffs.

Multi-LLM architecture uses more than one foundation model in the same product. A classification task may go to a small fast model, a summarization to a mid-tier model, and a high-stakes reasoning step to a frontier model. The router can be rule-based (by task type) or learned. Multi-LLM lets teams optimize cost per call without sacrificing quality on the steps that matter.

When it matters

When cost per call matters and your workflow has steps of varying complexity. Routing simple classification to a fast small model and reasoning to a frontier model can cut total inference cost by 40-70% with no quality drop.

Real example

A support workflow that routes intent classification to Haiku ($0.003/case), retrieval and summarization to Sonnet ($0.02/case), and final answer generation to Opus only for cases with low retrieval confidence ($0.15/case). Average blended cost: $0.04/case vs $0.15 single-model.

KPIs to watch

Average cost per case (target: -40% vs single-model baseline), routing accuracy (right model for the step, >95%), latency P95 per route (kept under SLA).

Related terms

See it in action

We use this every week

Book a 30-min call and we'll walk you through how Multi-LLM architecture shows up in a real engagement we're running.

Book a 30-min call