Transformer

The neural network architecture that powers modern LLMs, based on self-attention.

The transformer architecture (Vaswani et al., 2017) uses self-attention to process sequences in parallel, capturing long-range dependencies. It is the foundation of every modern LLM. Variants (decoder-only, encoder-decoder, mixture of experts) trade off capability, cost, and latency.

When it matters

Background concept for understanding how modern LLMs work. Rarely actionable for buyers — useful for engineers debugging attention patterns or context-window behavior.

Real example

When debugging why a 50k-token document is being summarized poorly, the answer often traces to transformer attention dynamics — 'Lost in the Middle' bias (Liu et al., 2023) shows attention degrades for content at positions 40-60% of the context.

KPIs to watch

Not directly measurable — proxy via context-utilization rate, attention-weighting tests in eval harness, position-bias detection on long-context tasks.

Related terms

LLM (Large Language Model)

A transformer-based model trained on language data to predict and generate text.

Context window

The maximum number of tokens a model can process in a single request.

Frontier model

The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.

Foundation model

A large model pre-trained on broad data, then adapted to many downstream tasks.

See it in action

We use this every week

Send a short brief and we'll walk you through how Transformer shows up in a real engagement we're running. We reply within one business day.

Start a project →