← Glossary/Models & foundations

Defined term

LoRA

Low-Rank Adaptation: a parameter-efficient fine-tuning method that trains small adapters instead of full weights.

LoRA inserts small trainable low-rank matrices into a model's layers, allowing task adaptation without modifying the base weights. It is cheap to train, fast to swap, and supports serving many adapted variants from one base. The default approach when fine-tuning is genuinely needed.

When it matters

When you need fine-tuning ergonomics (cheap to train, easy to swap) without full-weight retraining. The default fine-tuning method for production teams in 2026.

Real example

A multi-tenant SaaS that ships customer-specific LoRA adapters for tone customization: 12 customers, 12 adapters, all served from the same base model. Training cost per customer: ~$50; serving overhead: <5% latency.

KPIs to watch

LoRA adapter size (typically 10-100MB, vs 200GB+ for full weights), training time (hours, not days), inference latency overhead (<10% target).

Related terms

See it in action

We use this every week

Book a 30-min call and we'll walk you through how LoRA shows up in a real engagement we're running.

Book a 30-min call