Every agent framework I've used buries what the agent does inside the code that makes it run.
Agentic AI -- an LLM looping through tool use, planning, and reflection -- is how tools like Claude Code and OpenClaw operate autonomously. The "agentic" part isn't the model; it's the loop around the model. But in every framework I've tried, that loop is tangled with your application code: HTTP handlers, state management, orchestration. Changing what the agent says means deploying your app. Letting a domain expert tune prompts means giving them the codebase.
I built Perstack to fix this -- an open-source runtime that separates agent definitions from agent execution. The definition is what changes most; it shouldn't be the hardest to change.
The real problem: agents defined in code
Here's what a typical agent definition looks like in a framework:
class SecurityReviewer(Agent):
def __init__(self):
self.model = "claude-sonnet-4-5"
self.tools = [FileReader(), CodeAnalyzer()]
self.system_prompt = """
You are a security-focused code reviewer.
Check for SQL injection, XSS, and auth bypass.
Explain findings with severity ratings.
"""
async def run(self, query: str):
# orchestration logic
# tool calling logic
# state management
# error handling
# ...
The three lines that actually define what this agent does are buried inside a class that handles how it runs. The behavior and the machinery share a file, a deployment pipeline, and a test suite.
This creates three structural problems:
Framework lock-in. Your agent definition is expressed in the framework's API. Switching to a different runtime means rewriting every agent, not because the behavior changed, but because the packaging did.
Developer-gated iteration. The person who knows the domain -- the security expert, the support lead, the analyst -- can't touch the agent definition without a developer. Prompt tuning becomes a JIRA ticket.
No standalone testing. You can't run the agent without running the app. Feedback loops don't start until the application is wired up. By then, you've invested weeks before discovering the agent doesn't handle edge cases.
12 lines of TOML
Here's the same agent, defined outside the code:
[experts."security-reviewer"]
description = "Reviews code for security vulnerabilities"
instruction = """
You are a security-focused code reviewer.
Check for SQL injection, XSS, and authentication bypass.
Explain each finding with a severity rating and a suggested fix.
"""
[experts."security-reviewer".skills."@perstack/base"]
type = "mcpStdioSkill"
command = "npx"
packageName = "@perstack/base"
Twelve lines. The agent's identity, behavior, and tool access -- all in a single TOML file called perstack.toml. No imports, no classes, no orchestration code.
The instruction field is natural language. The skills section declares tool access via MCP -- @perstack/base is Perstack's built-in tool server, but any MCP-compatible server works (the same standard that Claude Desktop, Cursor, and other tools use). The runtime handles everything else: model access, tool execution, state management, context windows.
This is not a simplification. It's a separation. The agent definition is what changes hourly -- prompts get tuned, tools get added, delegation chains get restructured. The runtime is what changes quarterly. Coupling them in the same codebase means they deploy together, break together, and bottleneck each other.
From idea to running agent
You don't even need to write TOML by hand. create-expert generates it from a description:
npx create-expert "A code reviewer that checks for security vulnerabilities and suggests fixes"
This isn't scaffolding. create-expert is itself an agent that generates the perstack.toml, test-runs the resulting Expert against sample inputs, analyzes the execution, and iterates on the definition until behavior stabilizes. You get a working agent -- not a template.
Run it:
npx perstack start security-reviewer "Review this login handler"
perstack start opens a text-based interactive UI. You see the agent reason, call tools, and produce output in real time. No application to deploy. No environment to configure beyond an LLM API key.
Want headless output for CI?
npx perstack run security-reviewer "Review this login handler"
JSON events to stdout. Pipe it wherever you want.
Multi-agent collaboration in TOML
Experts that need to collaborate? Same file. The delegates field defines which Experts can call which:
[experts."security-reviewer"]
description = "Coordinates security review across the codebase"
instruction = """
Conduct a comprehensive security review.
Delegate file-level analysis to the file-reviewer.
Aggregate findings into a prioritized report.
"""
delegates = ["@security-reviewer/file-reviewer"]
[experts."@security-reviewer/file-reviewer"]
description = "Reviews individual files for security issues"
instruction = "Analyze the given file for SQL injection, XSS, CSRF, and auth bypass vulnerabilities."
The coordinator delegates to specialists. Each Expert runs in its own context window -- no prompt bloat from cramming everything into one conversation. The runtime handles the delegation, result aggregation, and checkpoint management.
From prototype to production
The CLI is for prototyping. For production, Perstack provides lockfile-based deployment and runtime embedding via @perstack/runtime. Execution is event-driven -- every step emits structured events, so it fits naturally into containerized environments where you need to stream progress back to your application. The same perstack.toml drives all of it -- the definition doesn't change because the deployment target changed. The getting started walkthrough covers the full path from CLI to application integration.
Runtime vs. framework: why the distinction matters
Frameworks are opinionated about how you build your application. They provide agent classes, memory abstractions, tool registries, orchestration APIs. Your agent lives inside the framework.
A runtime is opinionated about how agents execute. It doesn't care how your application is built. Your application talks to the runtime over an API. The agent definition is data, not code.
This distinction has real consequences.
Event-sourced execution. Every step the agent takes is recorded as a structured event with step-level checkpoints. You can resume from any point, replay to debug, and diff across model or provider changes. This isn't a logging feature -- it's the execution model. Non-deterministic behavior becomes inspectable.
Isolation by design. Each Expert runs in its own context. Workspace boundaries, environment sandboxing, tool whitelisting. When you deploy to a container platform, the isolation model maps directly to infrastructure -- one container, one Expert, one job.
Independent lifecycles. The agent definition updates hourly. The application code deploys weekly. Environment secrets rotate on their own schedule. User conversations are real-time. A runtime lets these four axes move independently. A framework couples them into one deployment.
Provider independence. Eight LLM providers, one config change. Anthropic, OpenAI, Google, DeepSeek, Ollama, Azure, Bedrock, Vertex. The agent definition doesn't mention the provider.
The methodology shift
The deeper point isn't about TOML syntax or CLI commands. It's about who owns what.
When agent definitions are code, developers own everything. When agent definitions are natural language in a config file, domain experts own behavior and developers own integration. Each side ships on its own schedule. The prompt specialist doesn't wait for a deploy. The developer doesn't review prompt tweaks.
This is the same separation that happened with infrastructure (Terraform), CI/CD (YAML pipelines), and containerization (Dockerfiles). The pattern is: extract the thing that changes most into a declarative format, give it its own lifecycle, and let a runtime execute it.
This separation is overdue for agentic AI.
Perstack is open source under Apache 2.0. The getting started walkthrough covers everything in this article and more. The source is on GitHub.
I'm building this. If the separation between agent definition and agent execution matters to you, I'd like to hear how you're thinking about it.
Top comments (0)