Defined term
Context window
The maximum number of tokens a model can process in a single request.
The context window is the budget for system prompt, retrieved context, user query, conversation history, and output combined. Frontier models offer 200k to 2M tokens, but practical use is bounded by latency, cost, and attention degradation over very long inputs. Architecture decisions (RAG vs full-context, compaction, summarization) hinge on context window size.
When it matters
When deciding what to put in the prompt. Bigger context windows are not always better — irrelevant context degrades quality (Liu et al., 2023 'Lost in the Middle').
Real example
A long-context summary task: feeding the full 100k-token doc to Claude vs feeding retrieved top-20 chunks (8k tokens). The retrieved version scored 15% better on factual accuracy because the model focused on relevant passages instead of being distracted by noise.
KPIs to watch
Context utilization rate (% of context that materially affects output), retrieval recall vs full-context (retrieval often wins above 50k tokens), cost per call (linear in context size).
Related terms
LLM (Large Language Model)
A transformer-based model trained on language data to predict and generate text.
RAG (Retrieval-Augmented Generation)
Generation grounded in retrieved source documents rather than the model's parametric memory alone.
Frontier model
The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.
Foundation model
A large model pre-trained on broad data, then adapted to many downstream tasks.
See it in action
We use this every week
Book a 30-min call and we'll walk you through how Context window shows up in a real engagement we're running.
Book a 30-min call