Defined term
RAG (Retrieval-Augmented Generation)
Generation grounded in retrieved source documents rather than the model's parametric memory alone.
Retrieval-Augmented Generation is the pattern where a query is first used to retrieve relevant passages from a curated source (vector store, search index, database), and those passages are passed to the model as context for the answer. RAG reduces hallucination on factual queries, allows answers to cite sources, and lets the system stay current without retraining. Production RAG requires source curation, chunking strategy, embeddings, retrieval evaluation, and answer evaluation.
When it matters
Use RAG when factual accuracy requires citing specific source material (policy, contracts, customer history). Skip RAG when the model's parametric knowledge is sufficient or when latency is critical.
Real example
A support agent that retrieves the 5 most relevant past tickets + the customer's product config + the relevant policy passages, then generates a grounded answer with inline citations the agent can verify in under 10 seconds.
KPIs to watch
Retrieval precision@5 (>0.75 target), answer groundedness rate (>90%), source citation completeness (100% on factual claims).
Related terms
Embeddings
Numerical vectors that represent the meaning of a text, image, or other piece of content.
Vector store
A database optimized for similarity search over embeddings.
Grounding
Anchoring model output to verifiable source material to reduce hallucination.
Agentic AI
AI systems that can plan, take multi-step actions, and use tools to complete tasks autonomously.
See it in action
We use this every week
Book a 30-min call and we'll walk you through how RAG (Retrieval-Augmented Generation) shows up in a real engagement we're running.
Book a 30-min call