Extended thinking

A model mode that performs longer internal reasoning before producing the answer.

Extended thinking lets the model allocate more inference compute to reasoning before responding. Most useful for hard problems where accuracy matters more than latency. Architecture decision: route only the hardest cases to extended thinking, keep routine traffic on fast paths.

When it matters

When the task requires deep reasoning where the model benefits from allocating more compute before responding. Use on the 5-10% of cases where quality matters more than latency.

Real example

A high-stakes underwriting decision routed through Claude with extended-thinking enabled. The model spends 30s reasoning internally (vs 3s standard), reviewing each policy clause, before recommending approve/decline. Accuracy: +18% on adversarial cases.

KPIs to watch

Quality lift on hard subset (target: >10pp), latency P95 on extended-thinking path (typically 20-60s), cost overhead (2-5× standard inference).

Related terms

Chain of thought

Prompting the model to show intermediate reasoning steps before producing a final answer.

Frontier model

The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.

Context window

The maximum number of tokens a model can process in a single request.

Foundation model

A large model pre-trained on broad data, then adapted to many downstream tasks.

See it in action

We use this every week

Send a short brief and we'll walk you through how Extended thinking shows up in a real engagement we're running. We reply within one business day.

Start a project →