Defined term
Transformer
The neural network architecture that powers modern LLMs, based on self-attention.
The transformer architecture (Vaswani et al., 2017) uses self-attention to process sequences in parallel, capturing long-range dependencies. It is the foundation of every modern LLM. Variants (decoder-only, encoder-decoder, mixture of experts) trade off capability, cost, and latency.
When it matters
Background concept for understanding how modern LLMs work. Rarely actionable for buyers — useful for engineers debugging attention patterns or context-window behavior.
Real example
When debugging why a 50k-token document is being summarized poorly, the answer often traces to transformer attention dynamics — 'Lost in the Middle' bias (Liu et al., 2023) shows attention degrades for content at positions 40-60% of the context.
KPIs to watch
Not directly measurable — proxy via context-utilization rate, attention-weighting tests in eval harness, position-bias detection on long-context tasks.
Related terms
LLM (Large Language Model)
A transformer-based model trained on language data to predict and generate text.
Context window
The maximum number of tokens a model can process in a single request.
Frontier model
The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.
Foundation model
A large model pre-trained on broad data, then adapted to many downstream tasks.
See it in action
We use this every week
Book a 30-min call and we'll walk you through how Transformer shows up in a real engagement we're running.
Book a 30-min call