DEV Community

Michael "Mike" K. Saleme
Michael "Mike" K. Saleme

Posted on • Edited on

The Agentic Maturity Model Is Missing an Axis: Who Validated the Claim

On June 3, the OWASP GenAI Security Project published State of Agentic AI Security and Governance 2.0, and with it an Enterprise Adoption Maturity Model that grades two things at once.

One axis measures deployment: AT0 Shadow AI through AT5 custom in-house agents that you built and whose identity, tools, and boundaries you control. The other measures governance maturity: Level 0 ad hoc through Level 3, where agents are treated as critical infrastructure with governance-as-code, kill switches, and real-time drift dashboards.

It is the clearest two-axis picture we have seen published. It also shares a blind spot with the maturity models that preceded it.

Both axes describe what the organization does. Neither captures who verified that it does it.

Two organizations, same cell, different truth

Take two organizations that both self-place at Governance Level 3. Both claim governance-as-code. Both claim kill switches. Both claim continuous drift monitoring.

One arrived there through an internal red-team's self-attestation. The other arrived through independent adversarial assessment with a published, reproducible evidence base. On the matrix, they occupy the same cell. In a procurement review, in an incident post-mortem, in front of a regulator, they are not the same artifact.

A maturity model that measures what an organization does, but not who validated it, grades the claim and not the control.

The pattern already exists in established assurance

This is not a novel demand. Assurance practice has separated self-attestation from independent validation for decades. A SOC 2 Type I report describes controls as designed; a Type II report tests whether they operated over time. A vendor security questionnaire and a third-party penetration test answer different questions, and no mature buyer treats them as interchangeable. Vulnerability scoring encodes the same instinct: CVSS tempers a finding by its Exploit Maturity — Unproven, Proof-of-Concept, Functional, High — grading the evidence behind a claim, not only the claim's severity.

Agentic governance has not yet imported that distinction. The EU AI Act's high-risk obligations — now deferred to December 2027 because the supporting standards aren't ready — turn on demonstrable oversight, not asserted oversight. The maturity model needs the third axis the regulation will require: evidence type.

What the third axis looks like

Evidence type asks one question of every governance claim: what class of evidence supports it, and is the claim stronger than that evidence permits?

This pattern exists in disciplined evaluation work. For example, in the public agent-security-harness VS-R01 evaluation of agent-payment infrastructure, every finding is tagged with an evidence class:

  • E1 — static or documentation observation
  • E2 — admission-time runtime observation (the API's response at the input gate, before settlement)
  • E3 — settlement-time runtime observation
  • E4 — adversarial replay and persistence validated
  • E5 — cross-context isolation confirmed against both negative and positive controls

Each class maps to a maximum permitted claim strength. An E2 observation may describe how an API admits or refuses a crafted input; it may not claim the platform enforces a limit, because enforcement is a settlement-time property and settlement was not measured. A recurring failure mode in agent-security writeups — making an enforcement claim from admission evidence — becomes visible at review time instead of in production.

That is the third axis made concrete. It is reproducible from a public branch state by any reviewer with their own test enrollment, which is the property that separates evidence from assertion.

The cell isn't the credential

The OWASP model is a real advance, and the right place to put this. Adoption tells you how much autonomy an organization has handed its agents. Governance maturity tells you how much control it claims to have built. Evidence type tells you whether anyone outside the organization can check.

For agents that hold credentials, move money, and act on untrusted input, the third question is the one that survives contact with a regulator. Grade the evidence, not the claim.


Sources

Top comments (0)