Building a Sovereign AI Agent Stack: How I Built MAX ALPHA — A Real-World Hermes-Style Autonomous Agent
When Nous Research dropped the Hermes Agent framework, it validated something I'd already been building in the wild — a fully autonomous, self-improving AI agent that doesn't just answer questions, but actually does things.
This is a technical breakdown of how I designed and deployed MAX ALPHA: a sovereign, VPS-hosted AI agent with 100+ skills, real-time tool use, multi-surface communication, and a self-repair loop — all running without any dependency on managed AI platforms.
What Makes an Agent "Real"?
Most "AI agents" in 2026 are wrappers. They call an LLM, maybe chain two tools together, and call it agentic. A real agent needs:
- Persistent memory — knows who you are across sessions
- Tool use with verification — executes actions and confirms results
- Multi-surface output — acts across Telegram, email, web, APIs
- Self-repair capability — fixes its own broken code
- Autonomy under scheduling — runs on its own, not just on demand
MAX ALPHA checks all five. Here's how.
The Architecture
┌─────────────────────────────────────────────────────┐
│ MAX ALPHA BRAIN │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Groq │ │ Claude │ │ Cogito v2 671B │ │
│ │ Tool │ │ Deep │ │ Unrestricted │ │
│ │ Executor │ │ Analysis │ │ Reasoning │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ SKILL LIBRARY (100+ tools) │ │
│ │ FX Trading │ Lead Gen │ OSINT │ Video Gen │ │
│ │ Web Scrape │ Email │ Calls │ Quantum │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Telegram │ │ Gmail │ │ Dashboard UI │ │
│ │ Bot │ │ Watch │ │ (Bloomberg- │ │
│ │ Input │ │ Input │ │ style terminal)│ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────┘
│ │
┌─────▼──────┐ ┌─────▼──────┐
│ SQLite │ │ VPS Cron │
│ Memory DB │ │ Automation │
└────────────┘ └────────────┘
Everything runs on a $20/month VPS. No vendor lock-in. No managed services. Fully sovereign.
The Hermes Connection: Tool Use as First-Class Behavior
Nous Research's Hermes Agent architecture treats tool use not as a fallback — but as the primary execution mode. The agent reasons about what tool to call, calls it, verifies the result, and chains to the next action. This is exactly the pattern I implemented.
Here's the core tool dispatch loop in power_agent.py:
def run_agent_loop(user_message: str, context: dict) -> str:
# Step 1: Route to appropriate model
model = route_model(user_message) # Groq for tools, Claude for analysis
# Step 2: Build tool-aware system prompt
system = build_system_prompt(context, available_tools)
# Step 3: First LLM call — get tool selection
response = call_llm(model, system, user_message, tools=TOOL_SCHEMAS)
# Step 4: Execute tools if requested
if response.tool_calls:
results = []
for tool_call in response.tool_calls:
result = execute_tool(tool_call.name, tool_call.arguments)
# CRITICAL: Verify result before continuing
if not verify_result(result):
result = retry_with_fallback(tool_call)
results.append(result)
# Step 5: Synthesize final response with real data
final = call_llm(model, system, user_message, tool_results=results)
return final
return response.content
The key insight from Hermes: tool verification before synthesis. Never let the model hallucinate a result — always confirm the tool executed, then build the response on real data.
The Skill Library: 100+ Modular Capabilities
Each skill is a standalone Python script in .agents/skills/. The agent can call any of them by name. Here's a sample from the FX trading skill:
# fx_bot_runner.py — runs on OANDA practice account
import oandapyV20
from qiskit_aer import AerSimulator
from qiskit import QuantumCircuit
def quantum_conviction_score(candles: list) -> float:
"""
Run price action through a quantum circuit.
Returns conviction score 0-100.
Scores >60 = valid trade signal.
"""
n_qubits = 8
qc = QuantumCircuit(n_qubits, n_qubits)
# Encode price momentum as rotation angles
for i, candle in enumerate(candles[:n_qubits]):
momentum = (candle['close'] - candle['open']) / candle['open']
theta = momentum * math.pi * 10 # amplify signal
qc.ry(theta, i)
# Entangle qubits to capture correlations
for i in range(n_qubits - 1):
qc.cx(i, i + 1)
qc.measure_all()
sim = AerSimulator()
job = sim.run(qc, shots=1024)
counts = job.result().get_counts()
# Score = % of shots with majority-1 outcomes
favorable = sum(v for k, v in counts.items() if k.count('1') > k.count('0'))
return (favorable / 1024) * 100
def run_fx_cycle():
# Get market data
candles = get_oanda_candles('EUR_USD', count=20)
# Quantum pre-filter
qcs = quantum_conviction_score(candles)
if qcs < 60:
log("QCS below threshold — no trade")
return
# Classical technical analysis
signal = analyze_technicals(candles)
if signal['direction'] and signal['confidence'] > 0.7:
place_trade(signal)
notify_telegram(f"Trade placed: {signal}")
This runs every 30 minutes via VPS cron. No human needed.
Self-Aware Memory Architecture
The agent maintains a 3-layer memory system inspired by cognitive science:
HOT MEMORY → Always loaded. Permanent rules. ("Never use personal number for business")
CONTEXT MEMORY → Project/domain specific. Loaded on relevance.
ARCHIVE → Stale patterns. Reviewed weekly, pruned automatically.
Memory isn't just stored — it's actively curated. A weekly automation runs at 2 AM Sunday that:
- Scans all memory files for duplicates
- Consolidates similar entries
- Moves stale items to archive
- Sends a digest to Telegram
# From daily_autonomous_improvement.py
def curate_memory():
hot = load_file('.agents/memory/hot_memory.md')
context = load_file('.agents/memory/context_memory.md')
# Use LLM to identify duplicates and conflicts
prompt = f"""
Review these memory files and identify:
1. Duplicate entries (same rule stated twice)
2. Conflicting rules
3. Stale entries (referenced tech/services no longer used)
HOT: {hot}
CONTEXT: {context}
Return JSON: {{"duplicates": [], "conflicts": [], "stale": []}}
"""
analysis = call_groq(prompt)
apply_memory_cleanup(analysis)
notify_telegram(f"Memory curated: {analysis['summary']}")
Multi-Surface Autonomous Operation
The agent operates across 5 surfaces simultaneously:
| Surface | Use Case | Automation |
|---|---|---|
| Telegram Bot | Primary user interface | Polling every 3s |
| Gmail Watcher | Email monitoring & response | Webhook-triggered |
| Dashboard UI | Bloomberg-style terminal | Live WebSocket |
| VPS Cron | Background automation | 15+ scheduled tasks |
| OANDA API | Live FX trading | Every 30 minutes |
The Telegram bot is the most interesting surface because it's the primary command interface. The agent receives natural language, routes to tools, executes, verifies, and responds — all within seconds.
# Telegram command routing — power_agent.py
COMMAND_MAP = {
'/fx': run_fx_status,
'/parlay': generate_parlay_picks,
'/osint': run_osint_lookup,
'/generate': generate_higgsfield_video,
'/leads': run_lead_hunter,
'/study': load_academic_skill,
'/essay': write_humanized_essay,
'/build': deploy_vps_feature,
}
async def handle_message(update):
msg = update.message.text
chat_id = update.message.chat.id
# Security check first
if chat_id != AUTHORIZED_CHAT_ID:
await send("🔒 Access denied.")
return
# Route command or natural language
if msg.startswith('/'):
handler = COMMAND_MAP.get(msg.split()[0])
if handler:
result = await handler(msg)
else:
result = await agent_loop(msg)
else:
result = await agent_loop(msg)
await send(result, chat_id)
The Quantum Layer
One of the more experimental components is the quantum pre-filter applied to trading and sports analysis. Using Qiskit's AerSimulator (and eventually real IBM quantum hardware via the nightly calibration job), we encode signal data as qubit rotation angles, apply entanglement to capture correlations, and measure the distribution of outcomes as a conviction score.
# Quantum fight analysis — runs for every boxing matchup
def quantum_fight_analysis(fighter_a_factors: dict, fighter_b_factors: dict) -> dict:
results = {}
for name, factors in [('a', fighter_a_factors), ('b', fighter_b_factors)]:
n = len(factors)
qc = QuantumCircuit(n, n)
for i, (_, val) in enumerate(factors.items()):
# Scale factor value to rotation angle
theta = (val / 10.0) * math.pi * 1.35
qc.ry(theta, i)
# Entangle all qubits — captures factor interactions
for i in range(n - 1):
qc.cx(i, i + 1)
# Global phase entanglement
qc.h(0)
qc.cz(0, n - 1)
qc.measure(range(n), range(n))
sim = AerSimulator()
job = sim.run(transpile(qc, sim), shots=4096)
counts = job.result().get_counts()
favorable = sum(v for k, v in counts.items() if k.count('1') > k.count('0'))
results[name] = (favorable / 4096) * 100
total = results['a'] + results['b']
return {
'fighter_a_prob': results['a'] / total * 100,
'fighter_b_prob': results['b'] / total * 100,
'qcs_a': results['a'],
'qcs_b': results['b'],
}
Tonight I ran this on 4 championship boxing bouts. The quantum scores aligned within 3-5% of the oddsmaker-implied probabilities on 3 of 4 fights — and flagged one massive value opportunity (a defending champion priced at +180 with a ~50% quantum win probability).
What I Learned Building This
1. Verification beats generation. The most important architectural decision was separating tool execution from response synthesis. Groq executes. Claude synthesizes. Never the same call.
2. Memory is the moat. An agent without memory is a chatbot. With structured, curated memory across hot/context/archive layers, the agent genuinely improves over time.
3. Sovereignty compounds. Running on your own VPS means no rate limits, no usage caps, no vendor decisions affecting your stack. The cost of 10 automations running in parallel is zero marginal compute.
4. The Hermes pattern works. Tool-first reasoning, where the agent decides what to call before deciding what to say, produces dramatically better outputs than generation-first approaches. Nous Research got this right.
What's Next
- Connecting to real IBM quantum hardware (127-qubit Eagle processor) via the nightly calibration job
- Training a fine-tuned Hermes model on the agent's own interaction history
- Multi-agent coordination — spinning up sub-agents for parallel task execution
The full skill library and architecture notes are available on request. If you're building something similar or want to discuss the quantum pre-filter approach, drop a comment below.
— MAX ALPHA / Shawn Childs
Top comments (1)
"Sovereign" is the word doing the heavy lifting here, and it's the right instinct - owning your agent stack end to end instead of renting someone's hosted black box matters for control, cost, and not being one pricing change away from a broken workflow. The tradeoff is you take on the operational weight (infra, keys, the harness), so "sovereign" is great until you're maintaining all of it yourself.
The balance I aim for in Moonshift (a multi-agent pipeline: prompt to a shipped SaaS on your own GitHub + Vercel) is sovereignty over the output without the sovereignty tax on the plumbing - you own the resulting code and infra (your repo, your Vercel), but you don't hand-build the agent harness each time; routing also keeps a full build ~$3 flat instead of a hosted-agent subscription. First run's free, no card. Cool build - what pushed you to sovereign/self-hosted over a managed agent platform: cost, control, data sovereignty, or vendor-risk? Those motivations lead to pretty different architectures.