Fine-tuning

Continuing the training of a base model on task-specific data to specialize behavior.

Fine-tuning adjusts a foundation model's weights on a curated dataset to bake in style, tone, or task patterns that prompting cannot reliably reproduce. Modern fine-tuning often uses parameter-efficient methods (LoRA, QLoRA) to stay cheap. Fine-tuning is rarely the right first move; prompting + RAG + evaluation usually wins faster.

When it matters

Rarely the first answer in 2026 — prompting + retrieval handles 80%+ of cases. Consider fine-tuning when you have 1000+ labelled examples and a clear quality gap that prompting cannot close.

Real example

A medical-coding workflow where after 6 months of production and 8000 labelled cases, fine-tuning Haiku on coding-specific examples lifted accuracy from 87% (prompted) to 94% (fine-tuned). ROI: yes, at 2M cases/year volume.

KPIs to watch

Labelled example count (1000+ minimum), accuracy lift vs prompted baseline (need >5pp to justify), inference cost vs base model (typically 1-2× for serving fine-tuned).

Related terms

LoRA

Low-Rank Adaptation: a parameter-efficient fine-tuning method that trains small adapters instead of full weights.

RAG (Retrieval-Augmented Generation)

Generation grounded in retrieved source documents rather than the model's parametric memory alone.

Prompt versioning

Treating prompts as code: stored, diffed, reviewed, and rolled back like any production artifact.

Context window

The maximum number of tokens a model can process in a single request.

See it in action

We use this every week

Send a short brief and we'll walk you through how Fine-tuning shows up in a real engagement we're running. We reply within one business day.

Start a project →