← Glossary/Governance & risk

Defined term

Guardrails

Pre and post checks that filter unsafe, off-topic, or non-compliant model outputs.

Guardrails wrap the model with deterministic validators: input filters (block prompt injection, PII leakage), output filters (block sensitive content, enforce JSON schema, check citation presence), and policy enforcers (refuse out-of-scope queries). Production-grade guardrails include logging and an escape valve to human review.

When it matters

Always required for any production AI workflow. Three layers: input filters (block malicious or out-of-scope queries), output validators (schema enforcement, fact-checking), and action approval queues (anything that writes to a system of record).

Real example

An outbound email agent with three guardrails: (1) input filter rejects messages over 500 words or missing required fields, (2) output validator requires every claim to map to a CRM field, (3) approval queue holds first 100 sends per new prospect segment for human review.

KPIs to watch

Guardrail trigger rate (1-5% healthy), false-positive rate on filters (<2%), action approval queue throughput (<1 day average wait).

Related terms

See it in action

We use this every week

Book a 30-min call and we'll walk you through how Guardrails shows up in a real engagement we're running.

Book a 30-min call