Defined term
Fine-tuning
Continuing the training of a base model on task-specific data to specialize behavior.
Fine-tuning adjusts a foundation model's weights on a curated dataset to bake in style, tone, or task patterns that prompting cannot reliably reproduce. Modern fine-tuning often uses parameter-efficient methods (LoRA, QLoRA) to stay cheap. Fine-tuning is rarely the right first move; prompting + RAG + evaluation usually wins faster.
When it matters
Rarely the first answer in 2026 — prompting + retrieval handles 80%+ of cases. Consider fine-tuning when you have 1000+ labelled examples and a clear quality gap that prompting cannot close.
Real example
A medical-coding workflow where after 6 months of production and 8000 labelled cases, fine-tuning Haiku on coding-specific examples lifted accuracy from 87% (prompted) to 94% (fine-tuned). ROI: yes, at 2M cases/year volume.
KPIs to watch
Labelled example count (1000+ minimum), accuracy lift vs prompted baseline (need >5pp to justify), inference cost vs base model (typically 1-2× for serving fine-tuned).
Related terms
LoRA
Low-Rank Adaptation: a parameter-efficient fine-tuning method that trains small adapters instead of full weights.
RAG (Retrieval-Augmented Generation)
Generation grounded in retrieved source documents rather than the model's parametric memory alone.
Prompt versioning
Treating prompts as code: stored, diffed, reviewed, and rolled back like any production artifact.
Context window
The maximum number of tokens a model can process in a single request.
See it in action
We use this every week
Book a 30-min call and we'll walk you through how Fine-tuning shows up in a real engagement we're running.
Book a 30-min call