← Glossary/Models & foundations

Defined term

Context window

The maximum number of tokens a model can process in a single request.

The context window is the budget for system prompt, retrieved context, user query, conversation history, and output combined. Frontier models offer 200k to 2M tokens, but practical use is bounded by latency, cost, and attention degradation over very long inputs. Architecture decisions (RAG vs full-context, compaction, summarization) hinge on context window size.

When it matters

When deciding what to put in the prompt. Bigger context windows are not always better — irrelevant context degrades quality (Liu et al., 2023 'Lost in the Middle').

Real example

A long-context summary task: feeding the full 100k-token doc to Claude vs feeding retrieved top-20 chunks (8k tokens). The retrieved version scored 15% better on factual accuracy because the model focused on relevant passages instead of being distracted by noise.

KPIs to watch

Context utilization rate (% of context that materially affects output), retrieval recall vs full-context (retrieval often wins above 50k tokens), cost per call (linear in context size).

Related terms

See it in action

We use this every week

Book a 30-min call and we'll walk you through how Context window shows up in a real engagement we're running.

Book a 30-min call