Defined term
LoRA
Low-Rank Adaptation: a parameter-efficient fine-tuning method that trains small adapters instead of full weights.
LoRA inserts small trainable low-rank matrices into a model's layers, allowing task adaptation without modifying the base weights. It is cheap to train, fast to swap, and supports serving many adapted variants from one base. The default approach when fine-tuning is genuinely needed.
When it matters
When you need fine-tuning ergonomics (cheap to train, easy to swap) without full-weight retraining. The default fine-tuning method for production teams in 2026.
Real example
A multi-tenant SaaS that ships customer-specific LoRA adapters for tone customization: 12 customers, 12 adapters, all served from the same base model. Training cost per customer: ~$50; serving overhead: <5% latency.
KPIs to watch
LoRA adapter size (typically 10-100MB, vs 200GB+ for full weights), training time (hours, not days), inference latency overhead (<10% target).
Related terms
Fine-tuning
Continuing the training of a base model on task-specific data to specialize behavior.
Context window
The maximum number of tokens a model can process in a single request.
Frontier model
The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.
Foundation model
A large model pre-trained on broad data, then adapted to many downstream tasks.
See it in action
We use this every week
Book a 30-min call and we'll walk you through how LoRA shows up in a real engagement we're running.
Book a 30-min call