Defined term
Guardrails
Pre and post checks that filter unsafe, off-topic, or non-compliant model outputs.
Guardrails wrap the model with deterministic validators: input filters (block prompt injection, PII leakage), output filters (block sensitive content, enforce JSON schema, check citation presence), and policy enforcers (refuse out-of-scope queries). Production-grade guardrails include logging and an escape valve to human review.
When it matters
Always required for any production AI workflow. Three layers: input filters (block malicious or out-of-scope queries), output validators (schema enforcement, fact-checking), and action approval queues (anything that writes to a system of record).
Real example
An outbound email agent with three guardrails: (1) input filter rejects messages over 500 words or missing required fields, (2) output validator requires every claim to map to a CRM field, (3) approval queue holds first 100 sends per new prospect segment for human review.
KPIs to watch
Guardrail trigger rate (1-5% healthy), false-positive rate on filters (<2%), action approval queue throughput (<1 day average wait).
Related terms
Prompt injection
An attack where user input manipulates the model into ignoring its system prompt or executing unintended actions.
AI governance
Policies, processes, and controls that make an AI system auditable and accountable.
Grounding
Anchoring model output to verifiable source material to reduce hallucination.
Hallucination
Plausible but factually incorrect output generated by an LLM with no grounding.
See it in action
We use this every week
Book a 30-min call and we'll walk you through how Guardrails shows up in a real engagement we're running.
Book a 30-min call