§ APPROACH · REPORT · NO. 002 · VOL. I     MAY 2026

What a receipt has to contain

Defining the shape of what we need to know and why.

Showing the contract and where it fits within working processes.

In Report 001 I discussed that AI-assisted engineering produces process opacity, and that the missing artefact is a receipt — signed, dated, hash-chained, queryable. I sketched, in a list of bullets, what such a receipt might capture.

This report goes a layer deeper. It shows the actual schema — what a receipt has to contain to do the job, and why each piece is in there. This is not about implementation or data capture. It is establishing the attributes that are deemed necessary to answer questions and resolve scrutiny from Agentic AI Delivery and Co-Development.

That sequence — define first, implementation second, measurement third — is deliberate. A clear contract makes the implementation testable. An unclear contract makes the implementation a moving target. So this report is about the contract.

The engineering lifecycle: six sequential phases from Define Goals through Deploy and Improve.
Six phases. Most engineering projects pass through all of them, in roughly this order.

The diagram above illustrates at a high level the conceptual steps involved when delivering a solution. Depending on the user, their requirements and capability, AI may be used in some varying degree of density across the full breadth of this flow.

The smallest case that breaks

Start with the schema-naming drift from Report 001. Two products, same architect, same house style, divergent schemas. Three plausible explanations for the drift, indistinguishable from the outside.

Now imagine a receipt for the build that produced the second product. What would it have to contain to tell you which of the three scenarios actually occurred?

You'd need a record of the dialogue — every clarifying question the agent raised and how it was answered. Without that, you cannot tell whether the agent asked. That gives you steps[], an ordered, typed sequence of every action and exchange.

You'd need each step typed by intent, not just by mechanism. A tool call is a mechanism. "Agent asked a clarifying question; human gave a response; agent acknowledged the rule; agent did something different anyway" — that is a sequence of typed events: clarifying_question, response, confirmation, override. Without typing, you can reconstruct what happened but not what kind of conversation it was.

You'd need the halt events called out as first-class. The moment an agent encounters something not in its closed vocabulary and refuses to invent — "this pattern doesn't match what I'm allowed to do; I'm asking rather than guessing" — is the methodology working. That refusal needs to be visible in the receipt, not buried inside step content. So steps[].halt_triggered is its own boolean, and halt_reason is its own typed enum.

That gets you most of the way to diagnosing the schema-naming drift. But not all the way.

What the dialogue alone cannot tell you

A few more fields are needed before the three scenarios can be distinguished cleanly.

You need to know which agent ran, on whose authority. A receipt without a principal (the accountable human) and an actor (who actually executed) cannot answer the "on whose authority" question that regulated environments will eventually ask. And in any team that uses agents to invoke other agents — Marlow telling Iris to do something, in studio terms — you need a delegation chain, not just a flat actor field. So delegation_path[] captures the chain of authority transfers, each with its own scope and granted-at timestamp.

This field came from a specific incident. The studio's agent operating manual records, in its 2026-04-11 changelog, a violation where one agent directed another to read directly from a database layer it shouldn't have touched. The fact of the violation could be reconstructed afterwards from session logs. But the chain of authority — who told whom they could do what, and on what basis — could not. A flat actor field would not have helped. A delegation chain would have.

You need to know what the agent was running on. Model family, model version, sampling parameters, the system prompt the model was operating under. Drift between two builds may be explained by drift between two models — a Sonnet that halts reliably and an Opus that waves things through, or the reverse. Without runtime.model.{family, version} and runtime.system_prompt_hash, this hypothesis is untestable.

You need to know how full the context was when each step happened. There is a hypothesis the methodology cares about: agents drift more when context is loaded. The skills that produce halt-and-ask behaviour live in context; if the model is summarising or compressing context, those skills may be the first to degrade. If the hypothesis is true, halt rate should drop as context_state.tokens_used_pct rises. If it's false, no harm done — the field is cheap to capture. Either way, the only way to find out is to capture it at every step. So context_state appears at receipt level and at step level.

You need inputs classified by sensitivity. PII, PHI, PCI, MNPI — the controlled vocabulary differs by regulatory regime, but every regulated environment cares whether the receipt-producing process touched data of a given class. And you need to record this by hash, not by content; the receipt must not become a secondary data exposure.

You need a spec reference — the planning document, ADR, or specification the build was meant to satisfy — and a diff against spec with severity typed as a closed enum (none, cosmetic, material, breaking). Without this, divergence is invisible. With it, divergence becomes queryable: "show me all builds where the diff was material and the human didn't acknowledge it."

You need a signature — the receipt signed by the principal, not just timestamped — and a chain link hashing the previous receipt. The first makes the principal's accountability cryptographically attached to the artefact. The second makes the chain tamper-evident: alteration of any earlier receipt invalidates the chain forward.

That is the receipt, in narrative form. Most of the rest of the schema — output hashes, signoff state, retention policy references — is plumbing. The fields above are what carry the methodology.

How the schema records its own basis

Every field in the receipt has an origin — one of reasoned, evidenced, regulatory, or operational. The origin is itself a typed value, part of the schema definition.

Reasoned fields came from first-principles thinking about what makes an artefact auditable. Evidenced fields came from a specific incident in the studio's own work. Regulatory fields are required by a named regime. Operational fields came from observing the studio's own agent operations and noticing a pattern worth capturing.

The reason this matters: by tagging origin, the schema helps provide clarity on what we want to track and why. The reasoning helps provide further justification on collection, prioritising those data facets that are absolutely necessary. A reader can ask "how do you know this field is necessary?" and the schema answers — "we reasoned it," or "we observed it," or "the FCA requires it," or "this incident showed us we needed it."

The honest disposition is that v0.1 is more reasoned than evidenced. The first hundred real receipts will tell us where the reasoning held and where it needs revising. Some fields will graduate from reasoned to evidenced. Some will turn out to be unnecessary and get retired. The methodology's evolution is not hidden; it is part of the schema's structure.

It is as much a part of the methodology to agree and understand what needs to be captured and why as it is to do the actual capturing and analysis.

The schema, v0.1

FieldDescriptionOrigin
receipt_idUUID for the receiptreasoned
schema_versionWhich version this receipt conforms toreasoned
chain.previous_receipt_hashHash of the previous receiptreasoned
chain.sequence_numberPosition in the chainreasoned
principalThe accountable human (id, role)regulatory
actorWho actually executed (id, type)reasoned
delegation_path[]Chain from principal to actor, with scopeevidenced
runtime.agent_nameAgent identityreasoned
runtime.model.{family, version, provider}Which model ranreasoned
runtime.samplingTemperature, top_p, max_tokensreasoned
runtime.system_prompt_hashHash of the system prompt at session startreasoned
runtime.skills_loaded[]Skills in context, with versionoperational
runtime.tools_available[]Tools accessible at runtimereasoned
context_state.tokens_used_pctContext utilisation at receipt startoperational
context_state.position_in_sessionStep N of totalreasoned
context_state.compaction_eventsTimes context was summarisedoperational
inputs[].sourceWhere data crossed into the model contextreasoned
inputs[].classificationSensitivity (public, pii, phi, pci, mnpi, etc.)regulatory
inputs[].content_hashHash of the input — never the raw dataregulatory
inputs[].redactedWhether redaction was applied before ingestionregulatory
spec_referenceHash or URI of the spec being built againstreasoned
steps[]Ordered, typed sequence of dialogue and actionsreasoned
steps[].step_typeTyped by intent (clarifying_question, response, override, halt, etc.)reasoned
steps[].halt_triggeredWhether this step was a halt-and-ask eventevidenced
steps[].halt_reasonWhy the halt firedevidenced
steps[].context_statePer-step snapshot of context stateoperational
outputs[].artefact_typeWhat was produced (ddl, code, doc, etc.)reasoned
outputs[].artefact_hashHash of the produced artefactreasoned
diff_against_spec.severitynone, cosmetic, material, breakingreasoned
diff_against_spec.human_acknowledgedDid the principal review the diffregulatory
signoff.statesigned, auto_approved, flagged, rejected, pendingregulatory
signoff.signature_methodcryptographic, attested, nonereasoned
signatureCryptographic signature of the receipt bodyreasoned
source_control.systemVersion control system (git, mercurial, none, etc.)reasoned
source_control.repositoryCanonical URN or URL of the repositoryregulatory
source_control.commit_hashCommit at session startregulatory
source_control.branchBranch the work was done onreasoned
source_control.is_dirtyWere there uncommitted changes at session startevidenced
source_control.commits_during_session[]Commits made during the sessionreasoned

Most rows speak for themselves. The narrative above covered the ones that warrant explanation.

Why hash-chained, not blockchain

One question this schema invites: if tamper-evidence matters, why not blockchain?

Three properties are worth distinguishing:

  1. Immutability — once written, cannot be altered.
  2. Tamper-evidence — alteration is detectable.
  3. Decentralised trust — no single party can rewrite history.

The receipt schema commits to (1) and (2). It does not commit to (3). The reason is operational: a regulator does not need decentralised consensus to trust evidence; they need the evidence to be signed by an accountable party and demonstrably unaltered. Hash-chained, signed receipts in append-only storage achieve that. Periodic anchoring to a public timestamp source provides external verifiability without putting receipt content on-chain.

A blockchain would provide all three properties at the cost of latency, transaction fees, public exposure of receipt metadata, and operational overhead. The minimum mechanism that achieves the audit purpose is the right one. Where stronger properties are required — multi-party attestation, supply-chain provenance — receipts can compose with existing standards (in-toto, SCITT) designed for exactly those needs, rather than reinventing them.

This is the working principle. Don't reinvent. Compose. Cite. Adapt.

AI-Verifiable Engineering (AVE)

This methodology has been working under a placeholder for a while. The name I keep coming back to is AVEAI-Verifiable Engineering. Engineering work that can be verified, after the fact, to have followed an agreed practice — with evidence captured at the time of the work rather than reconstructed afterwards.

The name is internally precise. It is not a claim that AVE verifies AI. It is a claim that AI-assisted engineering can be made verifiable — through closed vocabularies, halt conditions, and signed receipts of the dialogue that produced the work.

AVE is a practice. It is Data Argo's methodology, demonstrated through this journal series, supported by implementation for capture and collection, composing with industry-standard infrastructure (OpenTelemetry, in-toto, SCITT) rather than competing with it.

The schema above is AVE v0.1. It is a draft until it has survived contact with at least three real builds across two products. Some fields will graduate from reasoned to evidenced as further testing and utilisation occur. Some will turn out to be unnecessary. The version itself will be revised when the evidence requires it. That is how the methodology is meant to work — slowly, with evidence, with its own basis visible.

AVE coverage across the engineering lifecycle: solid core band over Plan Solution, Design Solution, Build Solution, and Validate Quality; dashed boundary extensions over Define Goals and Deploy and Improve.
Solid: receipts always emitted. Dashed: in scope when AI did substantive authorship.

The purpose of AVE is to provide a means to verify what was asked of AI, how it delivered on the ask, and what mitigating factors could directly or indirectly affect the quality of that output. AVE receipts should be collected wherever AI lends assistance. Varying degrees of involvement may occur at the definition stage or deployment stage depending on the manner of integration and collaboration between human user and AI assistant. AI assistance will more certainly be found within the planning through to build and validating phases, making these the highest volume of receipt types due to the density of AI agent involvement.

What's coming

Report 003 will introduce the capture layer — where and how we will capture the various attributes that make up the receipts. Report 004 will start to look at the output and testing of the receipts, discovering where the schema held, where it didn't, and what changed when AVE entered the workflow.

The contract is now defined. The work continues.

Tony Purkins · Principal · Data Argo · 2 May 2026

Navigare necesse est


If this resonates — particularly if you're working in regulated environments where AI-assisted engineering work needs to be defensible — I'd be interested to hear from you. Detailed schema documentation, including frontmatter not surfaced here, lives in the AVE workspace and will move to its own published location when the contract is stable.