How a receipt gets captured

Aligning to standards that already exist, and what's left for the methodology to build.

TONY PURKINS, PRINCIPAL · DATA ARGO

Report 001 named the gap. AI-assisted engineering produces opaque process, and the missing artefact is a receipt. Report 002 defined the contract — what a receipt has to contain to do its job.

A receipt is older than computing. Invoices, signed deliveries, ledger entries — the audit trail of any regulated business is built from preserved receipts of small transactions, each one kept because at some point it might be asked for. The methodology this series introduces applies the same discipline to AI-assisted engineering, where the receipt has been absent.

This report goes one layer down. With the contract defined, the next question is how the receipt actually gets produced. Where does each part come from? Who produces it? What part of the work is capture, and what part is methodology?

This report doesn't show a captured receipt — that comes in Report 004. This one defines what capture has to do, what it borrows to do it, and what's left for the methodology itself to build.

The temptation, having defined a contract, is to invent a system to satisfy it. The temptation should be resisted — there is no need to reinvent the wheel. The methodology's own contributions are small and specific. Most of what carries those contributions — the data formats, the transport, the storage — is already solved by open standards maintained by communities older and broader than this methodology.

This report acknowledges those contributions first. Then it walks the borrowed parts that carry them. The order matters: leveraging established standards lets us build on foundations of agreed capture, logging, and auditing.

Recent discussion of Agentic Observability — Datadog, OpenLLMetry, Langfuse, and others — has converged on OpenTelemetry as the shared vocabulary, each tool with its own focus, maturity, and intended use cases. That convergence is itself evidence that the borrowing strategy works. What none of these tools currently does is approach the problem from the angle of auditing engineering practice. OpenTelemetry is one part of that picture. AI-Verifiable Engineering (AVE) is what makes the picture an audit.

What AVE contributes

Four things, and only four.

The schema, and the discipline that maintains it. Report 002 defined the contract — what a receipt has to contain. The shape is AVE's; the controlled vocabularies that populate it (step_type, halt_reason, signoff.state, the classification enum, the severity enum) are AVE's. The discipline that keeps the vocabularies closed — refinable only by a written decision, never by silent invention — is AVE's. Two implementations differing on the schema or its vocabularies are not both running AVE. At most one is, and the other has produced a different kind of artefact.

The three-role signing model, realised as three attestation disciplines. AVE requires three signing events at three different points in a receipt's life — raw attestation at arrival, canonical attestation at derivation, principal attestation at the chain. Three different jobs, three different parties, three different keys. A receipt signed once at the end is not enough — it tells you who signed off, but it does not tell you the recording was independent of the agent doing the work. The three roles, and the three attestations they produce, are what makes a receipt audit-grade rather than a confession.

The chain that links receipts together. Each receipt anchors to the previous one by content hash. Alteration is mechanically detectable: change any earlier receipt and the link to the next one breaks. The chain is what makes a set of receipts an audit-grade record rather than a long-form log.

The conformance review. An adversarial check by an entity that does not share the producer's assumptions, distinct from integrity verification and signature verification. Integrity proves the bytes are unchanged. Signatures prove a key was applied. Conformance proves the receipt's fields mean what they say — that the field labelled tool result contains the tool's result, not something else. Without all three, the audit record is a tamper-proof set of unverified claims.

That is AVE. Four contributions. The schema and its discipline; the three-role signing model and its disciplines; the chain; the conformance review. Everything else — the store, the encoding, the signing library, the query interface, the pipeline that produces the records — is implementation.

The methodology requires an implementation to enact it. It does not require any particular implementation.

The capture problem is layered

A receipt is not produced in a single act. It is assembled from facts that arise at different times, from different sources, with different reliability properties. Each of these streams produces facts that any audit-grade record of an AI-assisted build has to contain.

A model invocation emits the prompt, the response, the tokens used, the sampling parameters, the tool calls. A tool call emits which tool, what arguments, what result. A version control system emits which commit, dirty or clean, what changed. A signing tool emits who signed what, when, on whose key. A continuous integration system emits what built what, from what source, with what tooling.

Each of those streams already has a vocabulary. Each has at least one open standard that defines what the stream means. The job of the capture layer is to receive these streams, translate them into a common shape, and assemble the result into a receipt that meets the contract from Report 002.

Fig. 6 shows what AVE requires above the seam — three attestation disciplines bounded by a single AVE region — and the borrowed substrate that carries vendor emission upward into the raw attestation discipline. AVE composes; it does not invent.

One AVE-owned region containing three attestation disciplines — raw, canonical, and principal — with boundary weights distinguishing methodologically required separations. The transport beneath the seam carries vendor emission upward. A transparency-log witness sits off to the side. — One AVE region, three attestation disciplines. The transport beneath the seam carries vendor emission upward; AVE composes what exists, it does not invent.

The standards AVE uses

The list is short and worth naming in full. Changes will be called out as they happen.

Standard	Role	What AVE borrows
OpenTelemetry gen_ai semconv	Vocabulary & transport	Field names for runtime, inputs, steps — carried by the OTel Collector
in-toto	Attestation envelope	The shape of integrity and output attestations
SLSA	Level claim	The provenance level itself
W3C Trace Context	Correlation	The trace identifier carried through every stage
Sigstore · Rekor · SCITT	Witness	Externally witnessed signature publication on the chain head
Durable persistence	Store	The discipline; the implementation pattern is a design choice

Six pieces, all maintained outside AVE, all with established communities and documented specifications. AVE cites them. It does not author them.

One of the six deserves a separate note. OpenTelemetry's gen_ai semantic conventions are the vocabulary the runtime, inputs, and steps parts of a receipt are encoded in. Attributes like gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.tool.name, gen_ai.event.user_prompt are not invented here — they are defined by the gen_ai working group and maintained alongside the broader OpenTelemetry specification.

This matters for portability. A receipt encoded in OTel gen_ai vocabulary can be read by anyone who reads OTel. Replacing the LLM vendor doesn't require re-encoding the receipt. Replacing the receipt-producing tool doesn't require re-encoding the vocabulary. When the gen_ai conventions evolve, the receipt evolves with them, in step with a standard maintained outside AVE. That portability is the methodological reason for using a borrowed vocabulary rather than inventing one.

OpenTelemetry is the prominent borrowing because runtime telemetry is the bulk of a receipt's content. The other source streams — version control, attestation envelopes, supply-chain provenance — arrive via their own borrowed transports. The standards table names each; Fig. 8 shows them composed into a real receipt.

Implementation glue

Between AVE's contract and the borrowed standards is the glue that produces the receipt. Different implementations pick different paths. AVE requires that the result satisfy the contract. It does not require any particular path to it.

One thing AVE does require: any implementation must keep the original runtime emission addressable alongside the canonical translation. The raw form is preserved and signed at arrival. The canonical form is derived from raw and references it by hash. The transformation has to be reviewable after the fact, and the cryptographic anchor is what makes the review more than a textual comparison. Where in the pipeline the original is kept, how it is stored, what retention policy applies — implementation choices. The fact that it is kept, and that derivation is anchored — methodology.

Data Argo's own implementation, which will be introduced in a later report, is one valid realisation of the disciplines Fig. 6 names. Other realisations are possible, and the methodology is silent on which to choose.

Fig. 7 shows what happens when more than one runtime contributes to the same build. Each vendor emission is retained separately at the raw attestation discipline — signed at arrival, addressable, independent. Convergence happens at canonical, where the conformed shape is derived from all raw artefacts and references each by hash. Swap a vendor, add a vendor — the AVE region's structure does not change, and the convergence point remains where vendor-specific naming is removed.

Multiple vendor emissions feeding the same AVE region. Each vendor's emission is retained separately at the raw attestation discipline; convergence happens at canonical. The AVE region's structure is invariant under vendor change or addition. — Swap the vendor, add a vendor — the seam holds. Each emission is retained separately at raw; convergence happens at canonical.

The receipt: linked, not unified

Report 002 introduced the receipt as a structural envelope — a header, body sections, an integrity footer. That structure is AVE's. The parts that fill it come from different places.

Read with origins in view, the same envelope looks different. The runtime section is OpenTelemetry data, encoded in gen_ai vocabulary, signed by the recording role. The inputs and outputs sections are in-toto attestations, each signed independently, each verifiable on its own terms. The chain footer is AVE's own — the per-step hashes, the link to the previous receipt, the signature over the whole. The witness, if present, sits off to the side: a Sigstore-class transparency log entry that proves the chain head existed at a particular time.

What that means in practice: a receipt is a linked composite, not a single fused document. A regulator wanting to verify what happened at step 47 of a particular build doesn't pull the whole receipt and re-verify everything. They pull the in-toto attestation for that step, verify its signature against the recording role's key, check that the attestation's hash appears in the chain at sequence 47, and confirm the chain itself verifies end-to-end. Four checks against four independent standards — the same pattern a regulator uses to verify a signed invoice against a ledger entry, applied to a new substrate. No piece requires trusting AVE; AVE only structures how the pieces relate.

Two states compared: with an implementation, the receipt is a real, queryable, related composite of signed records; without one, the same records exist as discrete signed logs on different systems with no structure holding them together. — Implementation is what makes a receipt. The same signed records are unjoined logs without it; a queryable receipt with it.

Without the implementation that assembles the parts, the parts are signed logs on different systems — each defensible on its own terms but unjoined and unqueryable as a whole. With it, they are a queryable receipt a regulator can review as one thing. The implementation is what makes the difference real.

Why alignment is an audit property

This composition is not a convenience. It is what makes the audit work.

A regulator asked to verify a receipt does not have to learn AVE to do it. They have to learn that AVE composes vocabularies they already trust — OTel for runtime, in-toto for attestation, SLSA for the provenance level, W3C Trace Context for correlation, a Sigstore-class transparency log for witness — and check whether the composition is faithful to each. The audit reduces to checking the seams.

If the receipt encoded its runtime in invented field names, the regulator would have to learn the field names and verify they were used consistently and take AVE's word for what they meant. With OTel gen_ai vocabulary, the regulator can check the receipt against the public specification. The vocabulary is not AVE's to defend.

The same logic applies for each borrowed layer. AVE's defence is restricted to its own four contributions. The substrate is defended by communities older and broader than AVE. The receipt's longevity is borrowed from those communities — outliving the methodology if necessary, outliving the implementation if necessary, outliving the firm if necessary.

A methodology that invented its own substrate could not make this argument. A methodology that composes can.

What's coming

Report 004 will introduce capture in practice — what happens when these standards meet a real runtime, where the gaps are between what the contract asks for and what the runtime can actually provide, and what the first signed receipts look like. Report 005 will go deeper on the conformance review — why integrity is not enough, and what the third leg looks like operationally. Subsequent reports will look at delegation across multi-agent work, the dependency graph behind an artefact, and the questions a stable schema makes answerable across a hundred builds.

The borrowed stack is named. AVE's own surface is small and defensible. The work continues.

Don't reinvent. Compose. Cite. Adapt.

Tony Purkins · Principal · Data Argo · 15 May 2026

Navigare necesse est

Share on LinkedIn →Follow the author →Reply by email →