This isn't about AI being dangerous. It's about a habit most of us have developed without noticing.
You start a Claude Code session. The agent asks permission for the first action. You read it, approve. Second action — you read it, approve. Third action — you skim it, approve. By the fourth or fifth, you've clicked "don't ask again for this session" and gone back to whatever you were doing.
That's not carelessness. That's a completely rational response to an approval-fatigue problem that the tools themselves create. The agents ask too often, for too many things, and we adapt by tuning them out.
The problem gets worse when you're working remotely. I run AI coding sessions in two ways: sometimes through OpenClaw connected to Telegram, where I send messages and the agent executes actions on my machine. Sometimes through Claude.ai on my phone, running a remote session. Either way, you're watching a small screen, approving actions with limited context, and eventually you stop reading carefully.
One day I came back to my machine and found that the agent had modified files I didn't expect — not maliciously, just confidently. An .env file updated. A config changed. A dependency added. Nothing catastrophic. But I had no record of it. I couldn't tell what changed, when, or why.
So I built something to watch.
AgentGuard is a background daemon that monitors what AI coding agents do to your files during and between sessions. It doesn't try to stop the agent from working — it tries to give you visibility into what happened.
What it actually does:
It watches configured directories with a file watcher. When a sensitive file changes (.env, keys, CI configs, package.json, agent memory files like CLAUDE.md), it logs the event to an audit trail and optionally sends a Telegram message with Keep/Rollback buttons — even if you're not at the machine.
It runs as a permanent background daemon (launchd on macOS) so it's always watching, not just during explicit sessions.
It has a macOS menu bar icon showing daemon status and recent activity — same idea as Docker Desktop's tray icon.
What I learned building it:
The hardest problem wasn't detection — it was deciding what to do about it. Block everything and the agent becomes useless. Block nothing and you're back where you started. The answer I landed on: log everything, alert on the things that actually matter (credential files, mass deletes, CI configs), and let the user decide.
The second thing I learned: real-time command interception is harder than it sounds. Codex is a Rust binary that doesn't use the shell in an interceptable way. The file watcher ended up being more reliable than the command interceptor for most agents.
The open question I don't have an answer to:
Is this the right layer to solve this problem? Should the agents themselves have better audit trails? Should there be a standard for "what did this session change"? I genuinely don't know.
I built this because I needed it. It's been running on my machine for a few weeks watching two projects. The log is mostly quiet — which is either good news or means I'm not watching the right things.
If you use Claude Code, Codex, aider, or run agents remotely via OpenClaw or similar — I'd be curious whether this matches a problem you've actually experienced, or whether the approval-fatigue thing is just me.
npm install -g agentguard-dev
GitHub: github.com/Osva2023/AgentGuard
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (2)
The "when you're not looking" framing is the crux, because the entire value prop of an agent is that you're NOT watching - it runs autonomously - which is exactly what makes observability non-optional rather than nice-to-have. With a human in the loop you catch the weird action live; with an autonomous agent the only record that it did something wrong is whatever you logged, so if you didn't capture the decision, the tool call, and the reasoning, a bad action is invisible until its downstream damage surfaces. Agents need more observability than normal services, not less, precisely because nobody's at the wheel - you're trading live supervision for after-the-fact auditability, and that trade only works if the audit trail is actually there.
This is why I treat the trace as a first-class artifact in Moonshift, the thing I build - a multi-agent pipeline that takes a prompt to a deployed SaaS, where every agent step (decision, tool call, verify result) is logged and gated, so "what did it do while I wasn't looking" always has an answer. Observability + a verify gate is what makes autonomy safe. Multi-model routing keeps a build ~$3 flat, first run free no card. Important question to be asking. What are you capturing as the unit of observability - just tool calls, or the reasoning/decision behind each action too? The reasoning is the expensive-but-crucial part when you're reconstructing a bad run.
Good framing — the trade you're describing (live supervision → after-the-fact auditability) is exactly the gap Ilum tries to close at the filesystem layer. Right now I capture tool calls and their effects (what files changed, what commands ran, which correlation rules fired), but not the reasoning behind each decision. That's a real limitation — if an agent deletes something it shouldn't, I know what it deleted but not why it decided to.
The "why" would require hooks at the LLM call level, which is a different layer than what I'm doing. Interesting to see how you're approaching that in Moonshift — does the verify gate run before or after the tool call executes?