Eval infrastructure
eval-harness-template
@ai-native-agency/eval-harness
Production-grade evaluation harness scaffold for LLM workflows. TypeScript + Python bindings, gating thresholds, regression detection, CI integration.
What it does
Lets you define labelled test sets, expected outputs, scoring rubrics, and gating thresholds — and runs them in CI on every prompt or model change.
Why it exists
Every engagement we ship requires an eval harness. We open-sourced the template so clients can fork it and operate it themselves at exit.
Public launch: Q3 2026