← Glossary/Architecture

Defined term

RAG (Retrieval-Augmented Generation)

Generation grounded in retrieved source documents rather than the model's parametric memory alone.

Retrieval-Augmented Generation is the pattern where a query is first used to retrieve relevant passages from a curated source (vector store, search index, database), and those passages are passed to the model as context for the answer. RAG reduces hallucination on factual queries, allows answers to cite sources, and lets the system stay current without retraining. Production RAG requires source curation, chunking strategy, embeddings, retrieval evaluation, and answer evaluation.

When it matters

Use RAG when factual accuracy requires citing specific source material (policy, contracts, customer history). Skip RAG when the model's parametric knowledge is sufficient or when latency is critical.

Real example

A support agent that retrieves the 5 most relevant past tickets + the customer's product config + the relevant policy passages, then generates a grounded answer with inline citations the agent can verify in under 10 seconds.

KPIs to watch

Retrieval precision@5 (>0.75 target), answer groundedness rate (>90%), source citation completeness (100% on factual claims).

Related terms

See it in action

We use this every week

Book a 30-min call and we'll walk you through how RAG (Retrieval-Augmented Generation) shows up in a real engagement we're running.

Book a 30-min call