Whitepaper · Action Layer

May 6, 202612 min read

A. Watts, Founder, Execution Protocol

The Action Layer

Why agent infrastructure needs a fourth layer, and what it does

The thesis

Models can generate. Agents can decide. Protocols can route.

None of them control what is allowed to happen, and that becomes the limiting factor the moment actions have consequences.

As soon as an AI system can spend money, modify a record, or commit a resource, that missing control becomes the most important part of the stack. Two questions become unavoidable: who decides whether the action happens, and what proof exists afterwards?

This paper argues that those two questions define a distinct layer of agent infrastructure, the action layer, and that the layer is not yet standardized across popular stacks today.

Once an agent can act, errors are no longer outputs. They are events. Events have cost, risk, and liability. The layer that handles them either exists, or it gets recreated, inconsistently, repeatedly, and eventually incorrectly, across every system that deploys agents.

Where we are in early May 2026

The agent stack now has clear shape. Three layers are well-developed and converging on standards:

The model layer produces tokens. OpenAI, Anthropic, Google, Meta, and the open-weights world all ship probabilistic systems that generate text, parse intent, and reason. This layer has its own infrastructure: inference services, prompt frameworks, evaluation harnesses, fine-tuning pipelines.

The orchestration layer turns models into agents. LangChain, LangGraph, AutoGen, CrewAI, and the Microsoft Agent Framework let developers compose models into multi-step workflows, give them tools, manage state, and run loops. This layer has matured fast since 2023 and now has reasonable standards for what an "agent" is.

The transport layer lets agents talk to each other and to systems. Model Context Protocol (MCP, Anthropic, late 2024) standardized tool invocation. Agent2Agent (A2A, Google, 2025–2026) standardized inter-agent messaging. Anthropic's MCP Connectors Directory (April 2026) curates the tool registry. Visa's Intelligent Commerce Connect (April 2026) and Mastercard's Verifiable Intent (March 2026) add commerce-specific transport.

These three layers are being built well. They share the same architectural assumption: the agent acts. The protocol gets out of the way.

This is the assumption that fails as soon as the agent is about to do something that costs money, allocates a real resource, or changes a record someone will later be audited against.

A concrete example

An agent working for a small business is asked to book a flight. The user said "book me to LA next Thursday, under $200." The agent calls a flight tool exposed via MCP, the tool returns a booking confirmation, the agent reports success.

Three things are unverifiable in this story:

Was the agent authorized to spend $200? The user said "$200" in a chat message. The agent inferred it. The tool got an instruction. Nothing in this chain is signed by the user, scoped to this transaction, or independently verifiable.

What did the agent actually do? The agent's report is its own assertion. The chat history is mutable. The tool's logs are the tool's logs. There's no shared, tamper-evident record of what action was taken.

What if the agent had been refused? If the flight cost $215 and the tool refused, where is the proof of refusal? The agent could retry, paraphrase, or silently fail. The user has no cryptographic evidence that the system stopped what it was supposed to stop.

These three questions — was it authorized, what was executed, what was refused — are the questions every CISO, every auditor, every compliance officer, every regulator under the EU AI Act, every payments fraud team, and every reasonable user will eventually ask.

The current stack does not answer them. It cannot answer them, because it was not designed to.

The shorthand: giving an agent authority to authorize its own actions is equivalent to letting a borrower approve their own loan. It works — until the first time it matters, and then it fails in a way that cannot be explained or proven.

The system that acts cannot be the system that decides whether it's allowed to act.

Why this layer emerges in every system

This is not a new pattern. Every system that crosses from suggestion into execution converges on the same architecture:

Decision is separated from authorization. Enforcement is deterministic, not probabilistic. Outcomes are auditable by someone other than the actor.

Financial systems work this way. The trader does not approve their own trade; the clearing house does. Operating systems work this way. The application does not grant itself privileges; the kernel does. Public infrastructure works this way. The driver does not certify their own vehicle; the inspector does. Web traffic works this way. The browser does not trust the server's word about its identity; it verifies a certificate signed by an authority neither party controls. Payment networks work this way. The merchant does not confirm settlement; the network does, and the receipt of settlement outlives the transaction.

The pattern recurs because it is the only one that scales past trust. In every domain where consequences accumulate, the actor and the authorizer have to be different things, and the record of what happened has to outlive the moment it was made.

AI is now crossing into that domain. The action layer is what the pattern looks like when the actor is an autonomous agent.

This pattern is not optional. It is what systems converge to once trust is no longer sufficient.

What "execution" actually requires

Before this can be a layer, it has to be a list of properties. Here are the four that matter:

1. Deterministic enforcement. When the agent submits an action, the decision to permit or refuse it cannot depend on the model that produced the request, the temperature setting, or the time of day. The protocol's evaluation is deterministic — replays of the same canonical request body verify identically against the same policy. Probabilistic systems cannot provide this property; deterministic systems can.

2. A boundary the model cannot cross. Authorization decisions must be made on a structured, schema-conformant request — not on natural language, not on a prompt, not on anything an LLM can rephrase. The model produces intent; the action layer evaluates a structured message derived from that intent. The model is structurally incapable of authorizing its own actions. No prompt, no jailbreak, no clever phrasing changes that, because the model does not have a path to the authorization decision in the first place. This is not a safety guideline. It is an architectural constraint.

3. Cryptographic proof of every outcome. Every terminal state — executed, instructed, blocked, amended, cancelled — must produce a signed, hash-chained receipt verifiable with only a public key, with no call back to the issuing system. Logs are not proof. Assertions are not evidence. A receipt is what a court, auditor, or regulator can hold. If an outcome cannot be proven, it cannot be relied on.

4. Refusal as a first-class outcome. Modern systems are optimized to explain successful execution. The harder requirement is evidencing prevented execution — proving that the system blocked a bad action, rejected a prompt injection, or refused an over-spend. As autonomous systems move into regulated and financial domains, prevented actions become as operationally important as completed ones. A refused action and an executed action must therefore leave receipts of equal cryptographic weight, produced by the same primitive on both paths. A system that cannot prove refusal cannot prove control.

These four properties are what define the layer. They are not implementation details. They are observable from outside.

How the layers fit

Model layer
OpenAI · Anthropic · Google · Meta · open-weights
Orchestration layer
LangChain · LangGraph · AutoGen · CrewAI · Microsoft Agent Framework
Transport layer
MCP · A2A · ACP · connector directories · commerce on-ramps
Action layer
Deterministic enforcement · model-impassable boundary · signed receipts

Payments · Commerce · Systems · Records

The action layer sits between the agent stack and the systems where consequences happen. Everything above it is probabilistic and fast-moving. Everything below it is deterministic and audited. The layer is the converter.

How this differs from what each layer already provides

A fair question: don't the existing layers already do some of this? They do — partially — and the partial-ness is the problem.

Capability	Orchestration	Transport
Generate intent from natural language
Compose multi-step workflows
Discover and call external tools	partial
Authenticate the calling agent	partial	partial
Evaluate authorization deterministically
Produce signed proof of executed actions
Produce signed proof of refused actions
Verify outcomes without calling back

The existing stack answers two questions: who called, and what was requested. It does not answer the four that matter once consequences are involved: what decision was made, why it was made, whether that decision can be proven later, and whether that decision can be independently verified.

A few clarifications, because these comparisons are easy to overstate:

MCP and A2A both include authentication concepts, and MCP's authorization specification has continued to develop through 2025–2026. They specify who is calling. They do not specify what decision is made about the call, by what rule, with what proof. Authentication answers identity; authorization answers permission; both are necessary, neither is sufficient for the action-layer job.

Orchestration frameworks include guardrails, output validators, and human-in-the-loop checkpoints. These are useful inside the orchestration layer. They are not deterministic, are not signed, and are not externally verifiable. A guardrail's decision is not a receipt.

REST APIs can have any of these properties bolted on — middleware, signed responses, audit chains. They can. Most do not. And when they do, the implementation varies by service, which means the agent stack cannot rely on a uniform property.

Uniformity is the property doing the work here. A receipt format that means one thing in LangChain, something different in CrewAI, and something different again in a custom orchestrator is not a receipt format — it is seventeen audit nightmares wearing a trench coat. The action-layer contract is valuable precisely because it is the same contract regardless of what produced the request. An auditor verifying a refusal does not care which framework the agent was built in. A regulator evaluating Article 12 compliance does not want to learn a new evidentiary format per vendor. The layer earns its place by being indifferent to what sits above it.

The point is not that other layers are useless. The point is that the action-layer properties — deterministic enforcement, a model-impassable boundary, cryptographic proof of all outcomes — are not the job of any other layer, and bolting them on case-by-case at the orchestration or transport level produces a different property in every deployment.

What an action-layer interaction looks like

A single round-trip:

The agent submits a structured request. Not a prompt, not an instruction — a schema-conformant message describing what it wants to do. The model that produced it is upstream.

The action layer evaluates. Authentication, authorization, policy, and resource-availability checks happen against the structured request. The check is deterministic — given the same canonical request body and the same policy, the verdict is reproducible.

Exactly one of two outcomes is produced. Either the action executes (and a signed receipt is issued), or the action does not execute (and a signed receipt is still issued, recording the refusal and its reason). There is no third state. There is no partial execution. There is no silent failure path.

Both outcomes leave the same primitive. The receipt is hash-chained and verifiable offline against a public key. Anyone — the user, the auditor, the regulator, the agent itself — can verify it without calling the action layer back.

This shape is not an opinion. It is the minimum viable contract for the layer. Anything less, and the questions at the start of this paper remain unanswered.

Why this matters now

Three things are happening simultaneously, and the action layer is at the intersection of all of them.

Regulation has caught up. The EU AI Act enters enforcement on August 2, 2026. Article 12 requires automatic recording of events for high-risk AI systems, with logs that are tamper-evident and retained. "Tamper-evident" is a cryptographic property; conventional logging does not provide it. The fines are EUR 15M or 3% of worldwide turnover, whichever is higher. Regulation does not create this requirement. It reveals it.

Agent commerce is being built without the action layer. Stripe ACP, Visa Intelligent Commerce Connect, Mastercard Verifiable Intent, and Google's Agent Payments Protocol all launched between January and April 2026. Each addresses a piece — Visa addresses transport, Mastercard addresses post-execution audit, Google addresses authorization mandates. None addresses the full set of action-layer properties as a uniform protocol. The early architectural choices are being made now.

Real incidents are landing. Across 2025 and into 2026, the public incident pattern has been consistent: an agent loops, an agent over-spends, an agent gets prompt-injected, an agent acts on stale or fabricated data. Each case prompts the same question: what evidence do we have of what actually happened? The action layer is the answer. Most teams do not yet have one.

The window for establishing what the action layer is, and what properties it must have, is short. Whoever defines the layer's contract — its terminal-output discipline, its boundary placement, its receipt format — sets the integration shape for everything downstream.

Within the next two years, the action layer will likely become a default expectation for enterprise systems permitting autonomous execution. The systems that lack one will be the case studies the others cite.

What the action layer is not

A few clarifications to forestall misreadings.

It is not a replacement for the model, the agent framework, or the transport. It sits beneath them and serves them. An agent built with LangChain, talking over MCP, calling a Visa endpoint, can use an action layer for the consequential calls. The other layers continue to do their jobs.

It is not a content moderator. Content safety lives upstream, at the model and orchestration layers. The action layer does not look at whether the natural-language input is harmful; it looks at whether the structured action it produces is permitted under deterministic policy.

It is not a database. Receipts are produced for verification, not as the system of record. Downstream systems still hold their own state. The action layer's contribution is a parallel, signed evidentiary stream that any party can verify without trusting the issuer.

It is not a single product. It is a layer with a defined contract. Multiple implementations of the layer can coexist, just as multiple implementations of MCP servers or orchestration frameworks coexist. The contract is what matters.

What this paper does not argue

This paper does not argue that any specific implementation is the right one. It argues that the layer is necessary, that its contract has observable, testable properties, and that the agent stack will not be production-ready for consequential actions until the layer is defined and adopted.

Subsequent papers will go deeper on specific properties of the action layer:

The Authorization Boundary
The Receipts Primitive
The Protocol Shape
The Payments Surface

Each will stand alone and assume only the framing established here.

A closing observation

The history of the network stack offers a useful pattern. TCP/IP did not arrive as a monolith. The link layer, the network layer, and the transport layer were defined separately, each with its own contract, each with multiple implementations. The discipline of the layered model — that each layer does its job and trusts its neighbors — is what made the whole thing scale.

The agent stack is going through the same process now, faster. The model layer is mature. The orchestration layer is converging. The transport layer is standardizing. The action layer is the gap that becomes obvious as soon as agents start doing things people care about being right.

The agent stack is becoming real infrastructure.

Infrastructure does not rely on trust. It relies on guarantees.

The action layer is where those guarantees must live. Naming the gap is the first step. Defining its contract is the second. Implementing it well is the third. The first step costs nothing and is overdue.

Every system that allows agents to act will either adopt this layer, or become it through failure.

A system without this layer is not insecure or under-engineered. It is incomplete. It cannot tell you what it did, what it refused, or by what authority — and "cannot tell you" is the only answer that matters once consequences are involved.

All whitepapers