Defined term
Extended thinking
A model mode that performs longer internal reasoning before producing the answer.
Extended thinking lets the model allocate more inference compute to reasoning before responding. Most useful for hard problems where accuracy matters more than latency. Architecture decision: route only the hardest cases to extended thinking, keep routine traffic on fast paths.
When it matters
When the task requires deep reasoning where the model benefits from allocating more compute before responding. Use on the 5-10% of cases where quality matters more than latency.
Real example
A high-stakes underwriting decision routed through Claude with extended-thinking enabled. The model spends 30s reasoning internally (vs 3s standard), reviewing each policy clause, before recommending approve/decline. Accuracy: +18% on adversarial cases.
KPIs to watch
Quality lift on hard subset (target: >10pp), latency P95 on extended-thinking path (typically 20-60s), cost overhead (2-5× standard inference).
Related terms
Chain of thought
Prompting the model to show intermediate reasoning steps before producing a final answer.
Frontier model
The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.
Context window
The maximum number of tokens a model can process in a single request.
Foundation model
A large model pre-trained on broad data, then adapted to many downstream tasks.
See it in action
We use this every week
Book a 30-min call and we'll walk you through how Extended thinking shows up in a real engagement we're running.
Book a 30-min call