LoRA

Low-Rank Adaptation: a parameter-efficient fine-tuning method that trains small adapters instead of full weights.

LoRA inserts small trainable low-rank matrices into a model's layers, allowing task adaptation without modifying the base weights. It is cheap to train, fast to swap, and supports serving many adapted variants from one base. The default approach when fine-tuning is genuinely needed.

When it matters

When you need fine-tuning ergonomics (cheap to train, easy to swap) without full-weight retraining. The default fine-tuning method for production teams in 2026.

Real example

A multi-tenant SaaS that ships customer-specific LoRA adapters for tone customization: 12 customers, 12 adapters, all served from the same base model. Training cost per customer: ~$50; serving overhead: <5% latency.

KPIs to watch

LoRA adapter size (typically 10-100MB, vs 200GB+ for full weights), training time (hours, not days), inference latency overhead (<10% target).

Related terms

Fine-tuning

Continuing the training of a base model on task-specific data to specialize behavior.

Context window

The maximum number of tokens a model can process in a single request.

Frontier model

The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.

Foundation model

A large model pre-trained on broad data, then adapted to many downstream tasks.

See it in action

We use this every week

Send a short brief and we'll walk you through how LoRA shows up in a real engagement we're running. We reply within one business day.

Start a project →