Infrastructure we ship · Apache 2.0

Open source.

Five projects we use inside every engagement. We open-source the templates so clients can fork them at exit and operate independently. Each ships with the evals and audit log spec attached — production-grade, not toy demos.

Eval infrastructure

eval-harness-template

@ai-native-agency/eval-harness

Preparing launchApache 2.0

Production-grade evaluation harness scaffold for LLM workflows. TypeScript + Python bindings, gating thresholds, regression detection, CI integration.

What it does

Lets you define labelled test sets, expected outputs, scoring rubrics, and gating thresholds — and runs them in CI on every prompt or model change.

Why it exists

Every engagement we ship requires an eval harness. We open-sourced the template so clients can fork it and operate it themselves at exit.

Public launch: Q3 2026

MCP / tool registry

mcp-server-template

@ai-native-agency/mcp-server

Preparing launchApache 2.0

Production-ready Model Context Protocol server scaffold with auth, rate limiting, audit logging, and standard tool registry patterns.

What it does

Spin up an MCP server in 10 minutes with the security defaults regulated mid-market needs: auth, rate limiting, structured audit log, schema validation.

Why it exists

MCP is becoming the 2026 standard for tool registries. We needed a hardened starter — sharing it openly accelerates the ecosystem.

Public launch: Q3 2026

Compliance / governance

audit-log-spec

@ai-native-agency/audit-log-spec

Preparing launchApache 2.0

JSON Schema for AI inference audit logs. WORM-compatible, regulator-ready. Maps cleanly to HIPAA, FINRA, GDPR retention requirements.

What it does

Specifies the exact fields every AI inference must log to be defensible to an auditor: input fingerprint, retrieval bundle hash, model version, prompt hash, output, downstream action, reviewer disposition.

Why it exists

There is no industry-standard schema for AI audit logs. We published ours so other agencies and in-house teams can converge on a defensible baseline.

Public launch: Q3 2026

Cost / latency optimization

claude-multi-model-router

@ai-native-agency/claude-multi-model-router

Preparing launchApache 2.0

Cost-optimized router for Claude Opus / Sonnet / Haiku — picks the right model per request based on complexity, latency budget, and cost ceiling.

What it does

Wraps the Anthropic SDK with a router that downsizes routine calls to Haiku, escalates complex reasoning to Opus, and caches deterministically. Cuts mid-market inference bills by 40-70%.

Why it exists

Every engagement we run uses multi-model routing to keep Run costs in check. The router was tribal knowledge — now it's a public package.

Public launch: Q3 2026

Prompt engineering

prompt-library

@ai-native-agency/prompt-library

Preparing launchApache 2.0

Versioned, evaluated prompt patterns for production workflows: customer service triage, document extraction, compliance review, lead qualification.

What it does

Battle-tested prompt patterns for common mid-market workflows, each with a labelled eval set and benchmark scores against Claude/GPT/Gemini.

Why it exists

Most prompt libraries are personal collections without eval data. We ship ours with the evals attached so you can verify before adopting.

Public launch: Q4 2026

Why we publish our infrastructure

Every engagement we ship runs on this stack. Open-sourcing the templates means clients can read the code before paying the deposit, fork it at exit without lock-in, and contribute back if they extend it. We don't sell licenses or hosting — the artifact quality IS the proof. The engagement itself is the integration, customization, and operating discipline you pay for.

Ready to use this stack?

Start an AI Project

Every engagement ships against this stack — pre-hardened for production from day one. No reinventing the wheel.