Guardrails

Pre and post checks that filter unsafe, off-topic, or non-compliant model outputs.

Guardrails wrap the model with deterministic validators: input filters (block prompt injection, PII leakage), output filters (block sensitive content, enforce JSON schema, check citation presence), and policy enforcers (refuse out-of-scope queries). Production-grade guardrails include logging and an escape valve to human review.

When it matters

Always required for any production AI workflow. Three layers: input filters (block malicious or out-of-scope queries), output validators (schema enforcement, fact-checking), and action approval queues (anything that writes to a system of record).

Real example

An outbound email agent with three guardrails: (1) input filter rejects messages over 500 words or missing required fields, (2) output validator requires every claim to map to a CRM field, (3) approval queue holds first 100 sends per new prospect segment for human review.

KPIs to watch

Guardrail trigger rate (1-5% healthy), false-positive rate on filters (<2%), action approval queue throughput (<1 day average wait).

Related terms

Prompt injection

An attack where user input manipulates the model into ignoring its system prompt or executing unintended actions.

AI governance

Policies, processes, and controls that make an AI system auditable and accountable.

Grounding

Anchoring model output to verifiable source material to reduce hallucination.

Hallucination

Plausible but factually incorrect output generated by an LLM with no grounding.

See it in action

We use this every week

Send a short brief and we'll walk you through how Guardrails shows up in a real engagement we're running. We reply within one business day.

Start a project →