Agentic AI Security Series (Part 3)
A Layered Security Model for Agentic AI Systems
In Part 1, we saw why AI agents break traditional security assumptions.
In Part 2, we used the OWASP Agentic AI Top 10 to understand how agents fail in production.
In Part 3, we answer the most important question for security leaders:
Where should controls actually live?
Not everything belongs in prompts.
Not everything belongs in a platform.
And not everything should be centralized on day one.
This post introduces a layered agent security model that maps risks to architecture — in a way that scales from early development to enterprise deployment.
A Top‑Down View: From Enterprise Security to Agentic AI
Enterprise security has always been layered.
- Infrastructure security protects compute and networks
- Application security protects logic and data flows
- Identity and governance define who can do what — and why
When AI systems were introduced, these layers expanded to include:
- Model access controls
- Training data governance
- Prompt and output filtering
- Evaluation and monitoring of model behavior
These controls work well for single‑step, stateless AI interactions.
Why Agentic Systems Change This
Agentic systems introduce:
- Long‑lived memory
- Tool execution
- Multi‑step planning
- Autonomous decision‑making across time
At this point, traditional AI and LLM security controls become necessary but insufficient.
Agentic AI does not replace enterprise security layers —
it sits on top of them, inheriting their assumptions and amplifying their failures.
This is why agentic security must be layered deliberately, rather than bolted onto prompts, frameworks, or platforms after the fact.
The Core Idea: Security Belongs at Multiple Layers
A common mistake organizations make is trying to solve agentic security in one place:
- “Let’s add better guardrails”
- “Let’s rely on the agent framework”
- “Let’s buy a platform and centralize everything”
None of these work alone.
Agentic security works only when controls are layered, with each layer having a clear responsibility.
At a high level, there are three layers:
- SDK Layer — close to developers and code
- Runtime Enforcement Layer — where actions are mediated
- Platform & Governance Layer — where organizations manage risk at scale
Each layer solves a different class of problems.
Layer 1 — SDKs (Developer‑Local Controls)
What this layer is
The SDK layer lives inside the application where agents are built.
It is closest to developers, fastest to adopt, and easiest to evolve.
This layer should handle baseline safety and containment, not enterprise‑wide governance.
What belongs here ✅
1. Input & Output Guardrails
- Prompt injection detection
- PII detection / redaction
- Content moderation
- Schema validation for outputs
These reduce likelihood of failure, but don’t contain blast radius.
2. Cost & Resource Controls
- Per‑request cost limits
- Token ceilings
- Retry caps
- Loop bounds
This directly mitigates runaway agents early.
3. Context Hygiene
- Treat retrieved documents and tool outputs as untrusted
- Basic provenance tagging
- Separation between user intent and retrieved data
Especially important for RAG‑heavy agents.
4. Lightweight Memory Guards
- Classify memory writes (facts vs preferences vs instructions)
- Block instruction‑like persistence by default
- Scope memory to user/session where possible
This addresses memory poisoning without building a platform.
What does not belong here ❌
- Centralized policy management
- Cross‑application identity governance
- Org‑wide audit correlation
- SOC workflows
SDKs should enable safety by default, not replace governance.
Layer 2 — Runtime Enforcement (Action‑Layer Security)
What this layer is
The runtime layer is where most organizations underinvest —
and where most agent incidents actually occur.
This layer sits between the agent and the real world.
Think of it as a control plane for actions, not for text.
What belongs here ✅
1. Tool & Action Mediation
Every tool call should pass through:
- Allow/deny checks
- Parameter constraints
- Least‑privilege credentials
- Rate limits and timeouts
Even if the model is compromised, actions must remain constrained.
2. Identity Binding
- Bind every action to a human initiator and tenant
- Enforce task‑scoped permissions
- Prevent model‑chosen resource identifiers
Agents should never operate as anonymous super‑users.
3. Plan Validation
For multi‑step agents:
- Validate plans before execution
- Gate high‑risk steps
- Require human approval for destructive actions
This is how goal hijacks become non‑events.
4. Real‑Time Observability
- Tool‑call telemetry
- Denied actions
- Plan drift
- Retry storms
- Cost curves
Critically: observe actions, not just outputs.
5. Response Hooks
- Kill switches
- Safe mode
- Token revocation
- Session quarantine
- Memory freeze
Without response, detection is useless.
What this layer does not do ❌
- Long‑term evidence management
- Cross‑org reporting
- Risk ownership tracking
- Compliance attestations
That’s the next layer.
Layer 3 — Platform & Governance (At Scale)
What this layer is
The platform layer exists once you have:
- Multiple agents
- Multiple teams
- Shared risk
- Regulatory or audit pressure
This layer turns controls into organizational capability.
What belongs here ✅
1. Central Policy Management
- Shared policy definitions
- Versioning and rollout
- Environment promotion (dev → prod)
- Exceptions and break‑glass workflows
2. Agent Inventory & Lifecycle
- Register agents
- Track ownership
- Scope capabilities
- Rotate credentials
- Decommission cleanly
Essential to prevent rogue agents.
3. Audit & Evidence
- Tamper‑evident logs
- Cross‑agent correlation
- Retention policies
- SIEM / GRC export
This is what auditors care about.
4. Risk & Compliance Mapping
- Map controls to frameworks (OWASP, NIST AI RMF, internal)
- Track coverage gaps
- Measure residual risk
5. Human Governance
- Approval workflows
- Incident playbooks
- Operator training
- Clear accountability
Governance is not automation — it’s decision clarity.
Why platform‑first fails
Organizations that start here usually:
- Slow down developers
- Centralize too early
- Build brittle processes
- Lose adoption
Platform works only after SDK and runtime layers exist.
How the Layers Work Together
- SDKs reduce likelihood
- Runtime enforcement limits impact
- Platform governance manages organizational risk
Or more simply:
SDKs keep agents well‑behaved.
Runtime keeps them contained.
Platforms keep organizations accountable.
A Practical Adoption Path
Phase 1 — Start with SDKs
- Guardrails
- Cost limits
- Context hygiene
Phase 2 — Add runtime enforcement
- Tool mediation
- Identity binding
- Kill switches
Phase 3 — Introduce platform governance
- Central policies
- Audit and evidence
- Risk ownership
Trying to skip phases usually backfires.
Final Takeaway for Security Leaders
Agentic AI security is not about finding the right control.
It’s about placing the right controls at the right layer.
You cannot govern what you cannot contain.
And you cannot contain what you cannot observe.
What’s Next (Part 4 Preview)
In Part 4, we’ll go deep into Layer 1: the SDK layer — the controls that should ship secure-by-default with every agent.
We’ll cover:
- input/output guardrails (prompt injection, PII, unsafe content)
- cost controls (token ceilings, retry caps, loop bounds)
- context hygiene for RAG (treat retrieval as untrusted)
- memory safety (what agents are allowed to remember — and what they must never persist)
Because if SDKs don’t establish baseline containment early, every later layer becomes harder to enforce.
Series Navigation
← Part 2 · Part 3
This series is written by a practitioner working on real‑world agentic AI security systems.
Some of the architectural insights here are informed by hands‑on experience building
developer‑first security tooling in the open.




Top comments (0)