← Glossary/Models & foundations

Defined term

Transformer

The neural network architecture that powers modern LLMs, based on self-attention.

The transformer architecture (Vaswani et al., 2017) uses self-attention to process sequences in parallel, capturing long-range dependencies. It is the foundation of every modern LLM. Variants (decoder-only, encoder-decoder, mixture of experts) trade off capability, cost, and latency.

When it matters

Background concept for understanding how modern LLMs work. Rarely actionable for buyers — useful for engineers debugging attention patterns or context-window behavior.

Real example

When debugging why a 50k-token document is being summarized poorly, the answer often traces to transformer attention dynamics — 'Lost in the Middle' bias (Liu et al., 2023) shows attention degrades for content at positions 40-60% of the context.

KPIs to watch

Not directly measurable — proxy via context-utilization rate, attention-weighting tests in eval harness, position-bias detection on long-context tasks.

Related terms

See it in action

We use this every week

Book a 30-min call and we'll walk you through how Transformer shows up in a real engagement we're running.

Book a 30-min call