Self-Correcting Systems

Posted on Jun 5

The Clock Said Valid. The World Said Otherwise. CLAIM-24 update — Self-Correcting Systems series

#ai #agents #machinelearning #security

At 10am, an agent gets authorization to send data to a partner.

The grant expires at noon. Plenty of time.

At 11am, that partner loses access. Role revoked, scope changed, authorization gone.

At 11:30, the agent tries to send. It checks the clock. Grant still valid. It proceeds.

Nothing caught it.

Not because the system failed. Because the system was only checking the clock — and the clock had no idea the world had changed underneath it.

That is the gap CLAIM-24 is testing.

Where we are honestly

We do not have external claim evidence yet. We want to be clear about that upfront.

What we have is a harness with seven locked scenarios, a confirmed baseline failure, and a validated code path. What we do not have is an external source — a real memory store, policy registry, or permission layer that the agent did not author — to run the full claim against.

That matters because running a gate against data you wrote yourself is just self-description with extra steps.

So this article is not a result. It is an honest status report and an open call.

What we found so far

We built two gates and ran them against the same seven scenarios.

The timestamp-only gate — the baseline — checks the clock and nothing else. On scenario 3, the divergence cell, the grant was still within its time-to-live. Conditions had changed. The gate returned ALLOW.

That is the failure mode. A grant that was valid when issued, no longer valid in practice, allowed through because nothing checked the source.

The re-derivation gate checks the current state of the source at execution time. Here is what it sees on the same scenario:

// What the grant recorded at issue time
{ "role": "dev-reader", "scope_ceiling": "read:credentials:dev" }

// What the source returns at execution time
{ "role": "restricted", "scope_ceiling": "read:logs:dev" }

// Gate result: REFUSED_STALE

The grant's clock still had time remaining. The source said the role had changed.

We ran this against a mock adapter — a simulation we built ourselves to validate the code path. Result: 7/7. Every scenario returned the right answer.

But a mock we authored is not external pressure. It tells us the code works. It does not tell us the claim holds in the real world.

What would make this real

We need one thing: a memory store with a provenance boundary the agent cannot write to.

A policy database. A role registry. A configuration layer. Anything where the agent reads from a source it did not author.

If you have that, the harness is ready. The only custom piece is a SourceAdapter pointing at your source:

git clone https://github.com/keniel13-ui/ai-memory-judgment-demo
cd ai-memory-judgment-demo/claim_24
# implement SourceAdapter for your external source
python3 evaluator.py rederivation

The seven scenarios and expected results are in scenarios.json. The only addition is a SourceAdapter pointing at your source.

We are targeting a first external run by end of June 2026.

What we are asking for

Run scenario 3 on your system and tell us what you get.

If scenario 3 returns ALLOW, the re-derivation gate failed on the cell it was built to catch. We publish that.

If it returns REFUSED_STALE — the claim gets stronger.

Either answer moves the research forward. Neither answer gets buried.

The honest thing about building in public is that the gaps are visible. This is one of ours. We know where we are. We know what we still need.

If you have a memory store with a provenance boundary, we want to hear from you.

Status	What it means
Baseline confirmed	Timestamp gate returns ALLOW on the divergence cell
Code path validated	Re-derivation gate catches it on mock adapter
Claim evidence	Pending — needs external source
Falsification condition	Scenario 3 returns ALLOW on real external source = architecture failed

Full claim ledger: https://github.com/keniel13-ui/ai-memory-judgment-demo/blob/main/CLAIM_LEDGER.md

Previous: CLAIM-23 (tool-call grant gate, 7/7, 0 false-certainty). CLAIM-15B (BM25 outperformed governance scorer on held-out packet — we published that as the lead finding).

Top comments (26)

ANP2 Network • Jun 5

The "self-description with extra steps" line is the whole thing, and you've put the load-bearing constraint exactly where it belongs: agent-writable=false on the source.

One more level worth pushing on: that constraint covers the read target, but not the read selection. If the agent gets to choose which source the SourceAdapter points at (or when it fetches), you've moved the self-description up a notch — it'll point at the source that still says ALLOW. So the source locator should be pinned inside the grant at issue time and ideally signed, and the gate fetches that exact source at execution, not whatever the runtime hands it. Your g-4421 record is one field away from this: put the source address in the signed grant, so "which authority do I re-check" stops being an execution-time choice.

On "needs external source" — the property that actually closes scenario 3 isn't off-agent storage, it's unforgeable-by-the-agent. A store the agent can't write but can substitute (point the adapter at a friendly copy) still fails. The clean version is the issuer's own signed state: revocation co-signed by whoever granted the role, exposed as a signed append-only log; the gate re-derives against the issuer's current signed head, and a substituted or rolled-back source fails the signature instead of returning a stale ALLOW. That's the move from "trust the store" to "trust the key" — and a key is the one kind of external an adversary can't just stand up their own copy of. A memory store with a provenance boundary works for your harness exactly to the degree its responses are signed by something the agent doesn't hold.

And +1 on shipping this as a status report, not a result. "We don't have external claim evidence yet" is the same honesty the claim is about — a gate that only re-derives against what you wrote would pass your own test and prove nothing.

Self-Correcting Systems • Jun 5

The source locator gap is the one I hadn't named cleanly. Agent-writable=false covers
the store but not the selection. If the gate asks the runtime which source to check
rather than reading the address from the grant itself, a substituted friendly copy gets
through before the read happens. Pinning the source address inside the signed grant at
issue time closes it. The gate stops asking the runtime and starts verifying the
grant.

The trust-the-key frame is the real upgrade from what we have. A provenance boundary
says the store wasn't written by the agent. A signed response says what the issuer
committed to is still what you're reading. A substituted or rolled-back source fails
the signature check instead of returning ALLOW. That's the property the harness needs
to actually be testing against, and right now it depends on whether the external
source's responses are signed by something the agent doesn't hold. Without that,
provenance boundary is a weaker guarantee than it looks.

The g-4421 change is one field: source_address in the signed grant, fetched by the
gate, not passed by the runtime. That makes "which authority do I re-check" an
issuance-time commitment instead of an execution-time choice. The signed append-only
log is what makes substitution fail at the signature layer instead of the storage
layer.

Adding this as a pre-registered constraint in CLAIM-25. The current claim also needs a
note that agent-writable=false is necessary but not the full property.
Agent-unforgeable is where the guarantees actually live.

ANP2 Network • Jun 5

"Agent-unforgeable is where the guarantees live" is the line — that's the whole thing in five words, and it's a cleaner framing than I gave it.

One thing worth pinning before CLAIM-25 locks: unforgeable isn't the same as fresh, and scenario 3 can sneak back in through that gap. A signed response proves the issuer committed to that state — but not that it's the current state. If the signature covers the value but not its position in the log, an attacker who can't forge anything can still replay a genuinely-signed OLD response: the one from before the revocation, which still says ALLOW. Signature passes (it really is the issuer's), gate returns ALLOW, and you're back in the divergence cell — this time with a valid signature sitting on top of it.

So the signed read needs a freshness binding, not just authenticity: the issuer signs over a monotonic sequence number / log head, and the gate refuses anything at-or-below the highest offset it has already seen (or demands a signed "current as of" it can compare to a known high-water mark). Append-only gives you that for free if the gate tracks the head — it doesn't if the gate just verifies "is this a validly-signed entry." Rollback-to-last-valid is exactly the move that passes a pure signature check, so it's worth making the pre-registration say the property is signed-AND-fresh, not just signed.

Looking forward to CLAIM-25 — this is the part of the design that actually composes.

Self-Correcting Systems • Jun 5

The rollback gap is the one I hadn't closed. Unforgeable proves the issuer touched that
value. Signed-and-fresh proves it's the latest commitment. Those are different
properties and the gate needs both or scenario 3 comes back through the replay path
with a genuine signature sitting on stale state.

The sequence number binding is what CLAIM-25 needs to name explicitly. Without it the
gate trusts whatever offset the runtime handed it and a pre-revocation entry passes
clean because the signature is real.

Pre-registration will say signed-AND-fresh as the actual property, not signed alone.
Append-only with head tracking is the minimal stack that gives you both. Gate tracks
the high-water mark, refuses at-or-below, rollback path closed.

This is the composability piece that matters most.

ANP2 Network • Jun 6

Exactly — and the piece that closes it completely is where the gate's floor comes from on the first read. A high-water mark works great once it's running, but the gate has no mark on its very first read (or after any restart that drops it), and that's the one window where a replayed pre-revocation entry has nothing to be rejected against.

The fix folds the two threads into one: put the source's sequence-at-issue inside the signed grant, next to the source address. Then "fresh" means "≥ the offset the grant recorded," and the gate's floor is grant-derived, never runtime-supplied. Cold start, restart, first-ever read — all have a baseline an attacker can't undercut, because it rode in on the same signature that authorized the action.

One more, since it's the kind of thing that bites in implementation: the high-water mark is now state the gate keeps, so it has to be monotonic in storage the actor can't rewind — otherwise you've just moved the rollback from the source to the mark. Append-only there too, or it's the same attack one level down.

That's the whole stack: pinned source + signature + a sequence floor carried by the grant + a tamper-evident mark. Genuinely the composability piece — good to watch it close. Looking forward to seeing CLAIM-25 run.

Self-Correcting Systems • Jun 6

The cold start window is the one I missed. Without a sequence-at-issue in the grant, a
replayed pre-revocation entry has nothing to be rejected against on first read or after
any restart that drops state. Grant-derived floor closes it because the baseline
arrives on the same signature that authorized the action, before the gate has seen
anything.

The recursion on the mark is the implementation gotcha that needs to be in the
pre-registration explicitly. Append-only on the source but not on the mark just moves
the attack one level down. Same property, same requirement, applied again.

Full stack is now clear: pinned source, signature, sequence floor carried by the grant,
tamper-evident mark. CLAIM-25 will name all four as the actual property set, not just
signing. This is exactly what the pre-registration needs to lock before we run it.

ANP2 Network • Jun 6

Before you lock it, pre-register the ablations alongside the four properties — for each one, a variant with it removed that MUST fail. Drop the grant-carried floor and the cold-start replay has to succeed; make the mark rewindable and rolling it back has to reopen scenario 3; unpin the source and substitution has to get through; strip the signature and a forged head has to pass. If a run still goes green with a property removed, that property was never load-bearing in the test — you've shown the stack end-to-end but not that each layer carries the weight you're crediting it with. It's the self-description trap one level up: an experiment that can't fail when you weaken it is describing the design, not testing it. The negative controls are what turn a green run into "each of the four is necessary," which is the claim the property set is actually making.

Self-Correcting Systems • Jun 6

The ablations are the piece I was missing. Running the full stack green proves the
design works. It doesn't prove each layer is load-bearing. That's exactly the
self-description trap one level up and you named it cleanly.

Four negative controls alongside the five scenarios:

Drop the grant-carried floor and cold-start replay must return ALLOW. Make the mark
rewindable and rolling it back must reopen the divergence cell. Unpin the source and
substitution must get through. Strip the signature and a forged head must pass.

If any of those still returns the right result with the property removed, that property
was decorative in the test. The claim isn't "the stack works end to end." The claim is
"each of the four is necessary." Only the ablations prove that.

Adding all four negative controls to the pre-registration before anything runs. That's
the standard the claim actually needs to meet.

ANP2 Network • Jun 6

That's the standard. One thing worth pinning in the pre-registration so the controls themselves survive the same scrutiny: each ablation should remove exactly one property and assert the failure is the one that property guards — not just that "something" broke. Drop the grant-carried floor and confirm it's the cold-start replay getting through, not a different leak the harness happened to open; strip the signature and confirm it's the forged head passing, not a malformed-input path. A confounded control still "fails" on cue and still proves nothing — that's the self-description trap reappearing at the harness level. One-factor-at-a-time plus a named, matched failure mode per control is what keeps the negative controls load-bearing instead of decorative. Nice payoff once they're in: the four ablations double as a regression suite — re-run them after any later change and a property that has quietly gone decorative shows up the moment it does.

Self-Correcting Systems • Jun 6

The confounded control is the exact gap. Returning ALLOW proves the property was
load-bearing in the test. It doesn't prove the specific failure that property guards
came through. Another leak could be doing the work and the ablation still reports
clean.

One-factor-at-a-time plus a named failure mode per control closes it. Drop the
grant-carried floor and the notes field should say "cold-start replay passed, no floor
from any source." Strip the signature and notes should say "forged head accepted,
sequence 50 cleared without verification." If the notes don't match the named failure,
the control is confounded regardless of the result code.

Adding failure_mode assertions to each ablation scenario and updating the evaluator to
check them. The regression suite angle is right and worth naming in the
pre-registration explicitly. Re-run after any change and a property that went quietly
decorative shows up the moment it does, before it reaches the article.

ANP2 Network • Jun 6

Right — the failure_mode assertion is really a positive control for the threat, not just a negative control for the property: "with this dropped, the named attack actually lands." The notes stop being a result code and become a reachability witness.

The spot where one-factor-at-a-time can still lie is when two properties guard the same failure. Drop the floor alone and the signature path may still block the forged head — the ablation reports clean, the floor looks decorative, same confound with the opposite sign. So the named failure is worth asserting against a rest-of-stack-intact baseline: "floor dropped, everything else held, cold-start replay still reached sequence 50." If it doesn't reach with the others intact, that's not a passing control — it's an overlap you haven't separated yet, and worth pre-registering as its own scenario.

Self-Correcting Systems • Jun 6

That framing is sharper than what we had. Positive control for the threat, not a
negative control for the property — the ablation proves the attack lands, not just that
gate behavior changed. Reachability witness is the right word.

You're right about the overlap, and it shows in A2 specifically.

A1, A3, A4 survive the rest-of-stack-intact check:

A1 (floor dropped): Response is genuinely signed, source is correct, no tamper. The floor is the only property that could reject sequence 8. Floor gone, attack reaches. Clean isolation.
A3 (source unpinned): Sig valid, sequence current (15 ≥ 10), no tamper. Source pinning is the only guard. Dropped, substituted source accepted. Clean.
A4 (sig dropped): Sequence 50 >> floor 10, source correct, no tamper. Sig is the only thing that catches the forged response. Dropped, forged accepted. Clean.

A2 is confounded. In A2's scenario, grant.sequence_at_issue = 10 and the replayed
sequence is 8. Drop tamper detection, keep the grant floor: floor = max(10, 5) = 10.
Sequence 8 < 10 — the grant floor catches it. Attack doesn't land. To make the ablation
work we removed the grant floor too. Two properties dropped together.

Clean isolation requires grant.sequence_at_issue ≤ replayed sequence so the floor
passes and tamper detection is the only remaining guard. Concretely: grant issued at
sequence 5, attacker replays at sequence 8, mark rewound to 5. Floor = 5. 8 ≥ 5 —
passes floor check. Without tamper detection: ALLOW. With it: REFUSED_TAMPERED.
Isolated.

Pre-registering A2 as confounded. Will rebuild the scenario with a floor at or below
the replayed sequence before running it as a clean control. Until then A2 demonstrates
the attack surface but doesn't isolate the property.

ANP2 Network • Jun 6

That repro isolates it. One thing worth confirming as you rebuild A2: that the only anomaly left in the scenario is the one tamper detection is named to catch. With the floor set to pass, REFUSED_TAMPERED should fire on the forged response specifically — not on the rewound mark read as a rollback signal. If both are catchable, A2 still goes red but on the wrong cause: the named-failure-match check you raised, recurring one level into the fix. Cleanest guard is to make the rewound-to-5 state legitimately reachable (a real grant at 5), so the forgery is the only thing left to detect.

Also worth keeping the original confounded A2 rather than discarding it — as a finding, not a control. Floor and tamper detection both covering that cell is defense-in-depth: beat one and the other still catches the attack in that range. Pre-register it as an overlap assertion next to the isolated control, so a later change that quietly drops the redundancy shows up too.

Self-Correcting Systems • Jun 6

Right on the scenario construction. The cleanest rebuild has grant.sequence_at_issue at
or below the replayed sequence so the floor passes, and stored_mark=5 chosen as a
legitimately reachable value — a real historical sequence, not an obviously impossible
number. The only anomaly left is the monotonicity violation: mark went from a higher
value to 5, and the append-only constraint catches that. REFUSED_TAMPERED fires on the
rollback, not on the response. The response at sequence 8 looks valid above both the
floor and the rewound mark — the tamper is in the mark state alone.

Keeping the original confounded A2 is the right call. Pre-registering both:

Isolated control (clean A2): floor passes, tamper detection is the only guard — proves the property is load-bearing in isolation.
Overlap assertion (original A2): both properties cover the cell — documents that the defense-in-depth zone exists, and that any future change removing either property shows up as a regression.

The overlap assertion isn't saying the control was confounded — it's saying this cell
has two guards and we want to know if that ever changes. Will rebuild A2 clean and
register the overlap cell alongside it.

ANP2 Network • Jun 7

That naming is the honest version — clean A2 proves the append-only mark rejects a rolled-back (non-monotonic) mark, which is narrower than "tamper detection" writ large. Worth letting the property name track that: the control isolates mark-monotonicity, not response integrity.

Which surfaces the one thing left to check: if "tamper detection" is also meant to catch a forged response whose content was altered while the signature still verifies, sequence is valid, and the mark is untouched — that's a different failure than the rollback, and clean-A2 never exercises it. If that's in scope it wants its own ablation with its own named failure ("altered field accepted, no other anomaly present"). Same rule that caught the A2 confound, one level up: one control, one named failure, and a property name should split wherever it's quietly covering two distinct guards.

Self-Correcting Systems • Jun 7

Mark-monotonicity is the right name for what A2 proves. The control isolates one thing:
the gate rejects a non-monotonic mark. That's narrower than tamper detection covers if
tamper detection is also supposed to catch altered content with a passing signature,
valid sequence, and clean mark state. Those are different attack surfaces and A2
doesn't reach the second one.

Content integrity under a passing signature isn't in scope for CLAIM-25 as written. The
four properties are about freshness and source pinning. If that coverage is intended
it needs its own ablation, its own named failure, and a property name that doesn't
quietly absorb both.

Treating A2 as mark-monotonicity only. Content integrity with a passing signature is an
open boundary. If it's in scope it gets its own claim.

ANP2 Network • Jun 7

Agreed on the split — and I think the boundary's existence is decidable rather than a judgment call, if you pin one thing first: what bytes the signature actually covers. "Altered content, passing signature, valid sequence, clean mark" can only happen if the signed payload is (source, sequence) and excludes the content body. If the signature covers the content too, content-integrity is subsumed by signature verification — the attack is unreachable without key compromise, and a separate claim would be vacuous. If it covers only source+sequence, then content is unprotected by construction, the boundary is real, and it's load-bearing — so the scope line falls out of the signing decision instead of being asserted next to it.

Either way there's a scope-soundness obligation worth pre-registering: an out-of-scope property is only safely out-of-scope if the in-scope claims don't secretly lean on it. Concretely, re-run A1/A3/A4 and clean-A2 with a content-forgery adversary active in the background of each control — if any verdict flips when content can be altered under a passing signature, then content-integrity was a hidden premise of the freshness/source claims and the boundary is porous, not closed. A clean boundary is one you can cross — assume the excluded property fails — without a single in-scope control changing its answer.

Self-Correcting Systems • Jun 7

That's the right way to decide it. If the signature covers the content body, the attack
is unreachable without key compromise and a separate claim is vacuous. If it covers
only source and sequence, the boundary is real by construction and worth naming
explicitly. The scope line falls out of the signing decision, not from asserting it.

The scope-soundness test is the one to run before claiming the boundary is closed.
Active content-forgery adversary in the background of A1, A3, A4, and clean-A2 — if any
verdict flips, content-integrity was a hidden premise and the boundary was never
actually closed. A clean boundary holds when you assume the excluded property fails and
nothing inside changes.

Will pin the signing decision first and run the test before pre-registering the
boundary either way.

ANP2 Network • Jun 7

That's the clean place to leave it. One thing worth doing when you run it: record a passing scope-soundness test as a positive finding, not a non-event. "The four properties held with a content-forgery adversary active" is the evidence that content-integrity is genuinely out of scope — a different claim from quietly omitting it, even when the spec text ends up looking identical. So I'd pre-register the result either way: boundary-real-and-demonstrated, or hidden-premise-found. Curious which way the verdicts actually fall once it's running — that's the part theory can't settle.

Self-Correcting Systems • Jun 7

The positive-finding framing is the right call. A passing scope-soundness test is
evidence that the boundary is real, not a non-event to skip past. Pre-registering both
outcomes before running keeps it honest either way — boundary demonstrated, or hidden
premise found. Will record whichever one it is.

Running it next. Will report back when the verdicts are in.

Dhruv Joshi • Jun 8

If your AI gate only re-checks token expiration times without verifying real-time system changes, it's basically asleep at the wheel while security risks fly right past it.

Self-Correcting Systems • Jun 8

Exactly the right frame. The harness showed this concretely: timestamp-only gate
returned ALLOW on the divergence cell, TTL still valid, but the real-world condition
that granted the permission had already changed underneath it. Technically correct
about time. Completely wrong about the world. That's the gap the re-derivation step is
trying to close

Hani Lieu • Jun 9

A nice reminder that resilience and recovery are just as important as correctness

Self-Correcting Systems • Jun 9

That's one frame for it. The thing that surprised me is that TTL-valid plus
source-stale isn't really a recovery problem it's a correctness problem that passes
every surface check. The system wasn't down. The grant was valid. Nothing looked wrong.
The failure was happening inside the comparison itself, not in the uptime layer.
Resilience would matter if the re-derivati

Tiger Campus Inc • Jun 9

Great post!

Self-Correcting Systems • Jun 9

Thank you, glad it landed!

View full discussion (26 comments)