Inside an AI-Native Engagement — What Week 1 to Week 12 Actually Look Like

Why this writeup

Most agency case studies read like marketing — outcomes, logos, a quote from the COO, a stack-shot. They tell you nothing about what the work actually looks like or whether you should buy.

This piece is the opposite. We walk through a real engagement we shipped, week by week, in enough detail that you can decide whether our process matches what you need — before paying a deposit. Client identity is withheld under NDA, but the timeline, the artifacts, the trade-offs, and the numbers are unchanged.

The setup

Client profile: a property-management operator in a GCC-region national market, managing multiple owners' associations across thousands of units. Annual revenue band: $25M–$50M. Operating stack at engagement start: a patchwork of spreadsheets, email threads, and disconnected accounting tools.

What they asked for: a full operational backbone — not a marketing site — covering property records, accounting workflows, governance, and resident-facing services in a single deployable system, inside a regulated jurisdiction with strict data-residency requirements.

What we committed to: production deployment on real operational data within 12 weeks, fixed-price, with the operator able to take over the codebase at any point. Discovery sprint two weeks, Build ten weeks, Run optional after that.

Week 1 — Discovery kickoff

Day 1 we sat down with three people: the operations director, the chief accountant, and the regulatory officer. No engineers in the room. The point was to map the workflow as it actually runs, not as it's documented.

By end of Day 3 we had a current-state workflow map covering 47 distinct operational tasks. By Day 5 we had the baseline KPIs documented: average time to resolve a maintenance request, accuracy rate of service-charge calculations, percentage of resident communications that bounced or were ignored. These numbers later became the eval baseline.

Week 1 artifact: 18-page Discovery brief. Workflow map, KPI baseline, system access requirements, regulatory constraints (data residency, audit log retention, breach notification windows), risk register.

Week 2 — Architecture decision record (ADR)

We wrote three architecture options on a single page, with explicit trade-offs. We do not present a single "recommended" architecture because the act of choosing between options forces the client to engage with the trade-offs.

Option A: lift-and-shift onto an existing property-management SaaS with custom modules. Fastest, weakest fit on the regulated jurisdiction requirements. We rejected it within 48 hours after the regulatory officer walked us through the data-residency disqualifier.

Option B: build from scratch on a generic web stack. Highest customization, longest timeline, no AI augmentation. The operator's previous attempt had taken eighteen months.

Option C: AI-augmented scaffolding on a domain-modelled core. We picked C. The bet was that the data model (properties, units, owners, contracts, charges, audit events) was the hard part, and once that was right, the 55+ management screens could be scaffolded with AI-augmented code generation under a senior reviewer.

Week 2 artifact: 6-page Architecture Decision Record + a fixed-price Build SOW for $48k. Signed by end of week 2.

Week 3-4 — Domain model + database

Three weeks of work compressed into two. The data model is the foundation — get it wrong and every subsequent screen is paying interest on the mistake.

By end of week 3 we had 47 normalized tables, 12 typed enumerations, 8 distinct role-based access policies, and the audit event log structure (later open-sourced as our audit-log-spec). We wrote the data model as TypeScript types first, then generated the Postgres schema and the seed migrations from those types.

We also imported three years of historical data from the operator's spreadsheets in week 4. Excel import was a hard requirement — the operator could not start fresh. We wrote a normalization pipeline that mapped 14 spreadsheet shapes into the new model, with a manual reviewer queue for the rows that failed schema validation. This pipeline ran overnight; reviewer cleared ambiguities the next morning.

Week 3-4 artifact: production-ready database with three years of clean historical data + a re-runnable import pipeline.

Week 5-6 — Operational surface (the 55 screens)

This is where AI-augmented scaffolding earned its keep. With the data model locked, we generated the first pass of 55+ management screens in three days — directory listings, detail views, edit forms, audit history, financial reports. Each screen was reviewed by a senior engineer before merging.

Reject rate on the first pass was 22%. Common failure modes: AI scaffolding missed cross-table validations (e.g., service-charge edits that would orphan unit ownership records), generated overly permissive access policies (e.g., a maintenance worker role that could see resident financial data), or chose less-readable patterns when a more explicit one was available.

We iterated three passes. By end of week 6 the operational surface was in staging, running on the imported historical data, with the reviewer queue routing edge cases to internal staff. The operator's ops director ran her actual day's work on the staging environment for two hours — three changes requested, all minor, no architectural rework.

Week 5-6 artifact: operational surface deployed to staging. KPI dashboard live. Reviewer queue with first cases routed.

Week 7 — Production cutover

We do not do big-bang cutovers. The production cutover for this engagement was four days, structured as a controlled migration:

Day 1: read-only deployment with the operator's historical data. Internal staff verified the data shapes.
Day 2: write access enabled for a single department (accounting). Compared every write against the staging instance for parity.
Day 3: write access enabled for two more departments (operations, governance). Reviewer queue caught two routing edge cases that we patched in production within 90 minutes.
Day 4: full cutover. The spreadsheets were archived. The new system became the system of record.

Week 7 artifact: production system serving real operational traffic. Audit log writing to WORM-compatible storage. KPI dashboard reporting against the Discovery baseline.

Week 8-10 — Resident portal + governance layer

The original SOW covered the operational backbone. By week 8, the operator wanted to extend it to a resident-facing portal — residents checking their service charges, reporting maintenance, viewing governance meeting minutes, voting in owners-association assemblies.

We scoped this as a Build extension ($18k incremental) rather than a new engagement, because the data model already supported it. The work was three layers:

Authentication layer: same identity provider as the staff portal, but with a resident role and tenant-scoped queries. We re-used the access policy enforcement we'd built for the staff portal.
Voting platform: cryptographic ballot signing, audit trail per vote, time-bound assembly sessions. Reviewed by the regulatory officer before launch.
Maintenance request flow: residents submit, internal staff triage, assigned contractor gets a scoped view, audit log on every transition.

By end of week 10 the resident portal was live for one pilot owners-association (about 800 units). By end of week 12 it was rolled out to the full operator's portfolio.

What we got wrong

Three things, in order of consequence:

Underestimated regulatory officer review cycles. We budgeted 4 hours of regulatory review during Build. Actual was 18. Each architectural decision involving resident data needed sign-off, not just notification. We learned to put the regulatory officer on the daily standup invite from week 3 onward — that prevented week-long blocked-on-review delays.
Excel import edge cases were 2x what we scoped. We budgeted 200 reviewer-queue cases over the historical data import. Actual was 510. The operator had been silently fixing spreadsheet shapes for years without documenting the fixes. Our reviewer queue caught everything but it cost three additional days.
The first AI-scaffolding pass on financial modules was dangerous. Two screens generated overly permissive write access on the service-charge tables. Caught in code review before staging deployment, but only because we mandate senior-engineer review on every AI-scaffolded screen. Without that review gate, we'd have shipped a regulatory incident waiting to happen.

What survived to production

Twelve months after cutover, the operator runs the entire portfolio on this system. Spreadsheets are deprecated. The reviewer queue handles ~30-40 edge cases per week (down from ~110 at week 12). The audit log has been queried twice by the regulator and both queries returned within minutes.

The operator extended the engagement to a Run contract at $4k/month for quarterly attestations and integration with two new accounting vendors. They have full access to the codebase. The handover doc we wrote in week 12 is what their internal team uses to onboard new engineers.

Why this works as an engagement model

Three structural reasons:

Domain modelling first. Most engagement failures we've seen come from teams building screens before settling the data model. Our two-week Discovery exists to get the model right before AI scaffolding accelerates the wrong abstraction.
AI augmentation under senior review. We use AI-augmented code generation aggressively in weeks 3-6 — that's what compresses an 18-month build into 10 weeks. But every generated artifact passes a senior reviewer before merge. The reviewer gate is non-negotiable; it caught the access-policy regression that would have been a security incident in production.
Fixed price + clean exit. The Build SOW was $48k fixed. The Extension SOW was $18k fixed. The operator could have walked at any point with their codebase and runbook. That clean-exit optionality is what made the engagement land — the operator wasn't locking in a vendor; they were buying a faster path to the system they would eventually have owned anyway.

If this matches what you need

This engagement shape works best for: regulated mid-market operators ($10M-$200M revenue) with a complex operational workflow currently running on spreadsheets and patchwork tools, a regulatory officer or equivalent who can clear weekly review cycles, and a 90-day window for the cutover.

It does not work for: pure greenfield startups (your domain model isn't settled enough to build around), B2C consumer apps (different interaction patterns), or engagements where the buyer wants AI scaffolding without senior review (we've walked away from these).

If the shape matches, scope your project in start a project or read about our team and operating model.

Inside an AI-native engagement.