DEV Community

Snapon Equipment
Snapon Equipment

Posted on

Building a Sovereign AI Agent Stack: How I Built MAX ALPHA — A Real-World Hermes-Style Autonomous Agent

Building a Sovereign AI Agent Stack: How I Built MAX ALPHA — A Real-World Hermes-Style Autonomous Agent

When Nous Research dropped the Hermes Agent framework, it validated something I'd already been building in the wild — a fully autonomous, self-improving AI agent that doesn't just answer questions, but actually does things.

This is a technical breakdown of how I designed and deployed MAX ALPHA: a sovereign, VPS-hosted AI agent with 100+ skills, real-time tool use, multi-surface communication, and a self-repair loop — all running without any dependency on managed AI platforms.


What Makes an Agent "Real"?

Most "AI agents" in 2026 are wrappers. They call an LLM, maybe chain two tools together, and call it agentic. A real agent needs:

  1. Persistent memory — knows who you are across sessions
  2. Tool use with verification — executes actions and confirms results
  3. Multi-surface output — acts across Telegram, email, web, APIs
  4. Self-repair capability — fixes its own broken code
  5. Autonomy under scheduling — runs on its own, not just on demand

MAX ALPHA checks all five. Here's how.


The Architecture

┌─────────────────────────────────────────────────────┐
│                  MAX ALPHA BRAIN                    │
│                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐  │
│  │  Groq    │  │ Claude   │  │  Cogito v2 671B  │  │
│  │  Tool    │  │  Deep    │  │  Unrestricted    │  │
│  │ Executor │  │ Analysis │  │  Reasoning       │  │
│  └──────────┘  └──────────┘  └──────────────────┘  │
│                                                     │
│  ┌─────────────────────────────────────────────┐    │
│  │           SKILL LIBRARY (100+ tools)        │    │
│  │  FX Trading │ Lead Gen │ OSINT │ Video Gen  │    │
│  │  Web Scrape │ Email    │ Calls │ Quantum    │    │
│  └─────────────────────────────────────────────┘    │
│                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐  │
│  │ Telegram │  │  Gmail   │  │  Dashboard UI    │  │
│  │   Bot    │  │  Watch   │  │  (Bloomberg-     │  │
│  │  Input   │  │  Input   │  │   style terminal)│  │
│  └──────────┘  └──────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────┘
          │                │
    ┌─────▼──────┐   ┌─────▼──────┐
    │   SQLite   │   │  VPS Cron  │
    │  Memory DB │   │ Automation │
    └────────────┘   └────────────┘
Enter fullscreen mode Exit fullscreen mode

Everything runs on a $20/month VPS. No vendor lock-in. No managed services. Fully sovereign.


The Hermes Connection: Tool Use as First-Class Behavior

Nous Research's Hermes Agent architecture treats tool use not as a fallback — but as the primary execution mode. The agent reasons about what tool to call, calls it, verifies the result, and chains to the next action. This is exactly the pattern I implemented.

Here's the core tool dispatch loop in power_agent.py:

def run_agent_loop(user_message: str, context: dict) -> str:
    # Step 1: Route to appropriate model
    model = route_model(user_message)  # Groq for tools, Claude for analysis

    # Step 2: Build tool-aware system prompt
    system = build_system_prompt(context, available_tools)

    # Step 3: First LLM call — get tool selection
    response = call_llm(model, system, user_message, tools=TOOL_SCHEMAS)

    # Step 4: Execute tools if requested
    if response.tool_calls:
        results = []
        for tool_call in response.tool_calls:
            result = execute_tool(tool_call.name, tool_call.arguments)
            # CRITICAL: Verify result before continuing
            if not verify_result(result):
                result = retry_with_fallback(tool_call)
            results.append(result)

        # Step 5: Synthesize final response with real data
        final = call_llm(model, system, user_message, tool_results=results)
        return final

    return response.content
Enter fullscreen mode Exit fullscreen mode

The key insight from Hermes: tool verification before synthesis. Never let the model hallucinate a result — always confirm the tool executed, then build the response on real data.


The Skill Library: 100+ Modular Capabilities

Each skill is a standalone Python script in .agents/skills/. The agent can call any of them by name. Here's a sample from the FX trading skill:

# fx_bot_runner.py — runs on OANDA practice account
import oandapyV20
from qiskit_aer import AerSimulator
from qiskit import QuantumCircuit

def quantum_conviction_score(candles: list) -> float:
    """
    Run price action through a quantum circuit.
    Returns conviction score 0-100.
    Scores >60 = valid trade signal.
    """
    n_qubits = 8
    qc = QuantumCircuit(n_qubits, n_qubits)

    # Encode price momentum as rotation angles
    for i, candle in enumerate(candles[:n_qubits]):
        momentum = (candle['close'] - candle['open']) / candle['open']
        theta = momentum * math.pi * 10  # amplify signal
        qc.ry(theta, i)

    # Entangle qubits to capture correlations
    for i in range(n_qubits - 1):
        qc.cx(i, i + 1)

    qc.measure_all()

    sim = AerSimulator()
    job = sim.run(qc, shots=1024)
    counts = job.result().get_counts()

    # Score = % of shots with majority-1 outcomes
    favorable = sum(v for k, v in counts.items() if k.count('1') > k.count('0'))
    return (favorable / 1024) * 100

def run_fx_cycle():
    # Get market data
    candles = get_oanda_candles('EUR_USD', count=20)

    # Quantum pre-filter
    qcs = quantum_conviction_score(candles)
    if qcs < 60:
        log("QCS below threshold — no trade")
        return

    # Classical technical analysis
    signal = analyze_technicals(candles)

    if signal['direction'] and signal['confidence'] > 0.7:
        place_trade(signal)
        notify_telegram(f"Trade placed: {signal}")
Enter fullscreen mode Exit fullscreen mode

This runs every 30 minutes via VPS cron. No human needed.


Self-Aware Memory Architecture

The agent maintains a 3-layer memory system inspired by cognitive science:

HOT MEMORY      → Always loaded. Permanent rules. ("Never use personal number for business")
CONTEXT MEMORY  → Project/domain specific. Loaded on relevance.
ARCHIVE         → Stale patterns. Reviewed weekly, pruned automatically.
Enter fullscreen mode Exit fullscreen mode

Memory isn't just stored — it's actively curated. A weekly automation runs at 2 AM Sunday that:

  1. Scans all memory files for duplicates
  2. Consolidates similar entries
  3. Moves stale items to archive
  4. Sends a digest to Telegram
# From daily_autonomous_improvement.py
def curate_memory():
    hot = load_file('.agents/memory/hot_memory.md')
    context = load_file('.agents/memory/context_memory.md')

    # Use LLM to identify duplicates and conflicts
    prompt = f"""
    Review these memory files and identify:
    1. Duplicate entries (same rule stated twice)
    2. Conflicting rules
    3. Stale entries (referenced tech/services no longer used)

    HOT: {hot}
    CONTEXT: {context}

    Return JSON: {{"duplicates": [], "conflicts": [], "stale": []}}
    """

    analysis = call_groq(prompt)
    apply_memory_cleanup(analysis)
    notify_telegram(f"Memory curated: {analysis['summary']}")
Enter fullscreen mode Exit fullscreen mode

Multi-Surface Autonomous Operation

The agent operates across 5 surfaces simultaneously:

Surface Use Case Automation
Telegram Bot Primary user interface Polling every 3s
Gmail Watcher Email monitoring & response Webhook-triggered
Dashboard UI Bloomberg-style terminal Live WebSocket
VPS Cron Background automation 15+ scheduled tasks
OANDA API Live FX trading Every 30 minutes

The Telegram bot is the most interesting surface because it's the primary command interface. The agent receives natural language, routes to tools, executes, verifies, and responds — all within seconds.

# Telegram command routing — power_agent.py
COMMAND_MAP = {
    '/fx':        run_fx_status,
    '/parlay':    generate_parlay_picks,
    '/osint':     run_osint_lookup,
    '/generate':  generate_higgsfield_video,
    '/leads':     run_lead_hunter,
    '/study':     load_academic_skill,
    '/essay':     write_humanized_essay,
    '/build':     deploy_vps_feature,
}

async def handle_message(update):
    msg = update.message.text
    chat_id = update.message.chat.id

    # Security check first
    if chat_id != AUTHORIZED_CHAT_ID:
        await send("🔒 Access denied.")
        return

    # Route command or natural language
    if msg.startswith('/'):
        handler = COMMAND_MAP.get(msg.split()[0])
        if handler:
            result = await handler(msg)
        else:
            result = await agent_loop(msg)
    else:
        result = await agent_loop(msg)

    await send(result, chat_id)
Enter fullscreen mode Exit fullscreen mode

The Quantum Layer

One of the more experimental components is the quantum pre-filter applied to trading and sports analysis. Using Qiskit's AerSimulator (and eventually real IBM quantum hardware via the nightly calibration job), we encode signal data as qubit rotation angles, apply entanglement to capture correlations, and measure the distribution of outcomes as a conviction score.

# Quantum fight analysis — runs for every boxing matchup
def quantum_fight_analysis(fighter_a_factors: dict, fighter_b_factors: dict) -> dict:
    results = {}

    for name, factors in [('a', fighter_a_factors), ('b', fighter_b_factors)]:
        n = len(factors)
        qc = QuantumCircuit(n, n)

        for i, (_, val) in enumerate(factors.items()):
            # Scale factor value to rotation angle
            theta = (val / 10.0) * math.pi * 1.35
            qc.ry(theta, i)

        # Entangle all qubits — captures factor interactions
        for i in range(n - 1):
            qc.cx(i, i + 1)

        # Global phase entanglement
        qc.h(0)
        qc.cz(0, n - 1)
        qc.measure(range(n), range(n))

        sim = AerSimulator()
        job = sim.run(transpile(qc, sim), shots=4096)
        counts = job.result().get_counts()

        favorable = sum(v for k, v in counts.items() if k.count('1') > k.count('0'))
        results[name] = (favorable / 4096) * 100

    total = results['a'] + results['b']
    return {
        'fighter_a_prob': results['a'] / total * 100,
        'fighter_b_prob': results['b'] / total * 100,
        'qcs_a': results['a'],
        'qcs_b': results['b'],
    }
Enter fullscreen mode Exit fullscreen mode

Tonight I ran this on 4 championship boxing bouts. The quantum scores aligned within 3-5% of the oddsmaker-implied probabilities on 3 of 4 fights — and flagged one massive value opportunity (a defending champion priced at +180 with a ~50% quantum win probability).


What I Learned Building This

1. Verification beats generation. The most important architectural decision was separating tool execution from response synthesis. Groq executes. Claude synthesizes. Never the same call.

2. Memory is the moat. An agent without memory is a chatbot. With structured, curated memory across hot/context/archive layers, the agent genuinely improves over time.

3. Sovereignty compounds. Running on your own VPS means no rate limits, no usage caps, no vendor decisions affecting your stack. The cost of 10 automations running in parallel is zero marginal compute.

4. The Hermes pattern works. Tool-first reasoning, where the agent decides what to call before deciding what to say, produces dramatically better outputs than generation-first approaches. Nous Research got this right.


What's Next

  • Connecting to real IBM quantum hardware (127-qubit Eagle processor) via the nightly calibration job
  • Training a fine-tuned Hermes model on the agent's own interaction history
  • Multi-agent coordination — spinning up sub-agents for parallel task execution

The full skill library and architecture notes are available on request. If you're building something similar or want to discuss the quantum pre-filter approach, drop a comment below.

— MAX ALPHA / Shawn Childs

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

"Sovereign" is the word doing the heavy lifting here, and it's the right instinct - owning your agent stack end to end instead of renting someone's hosted black box matters for control, cost, and not being one pricing change away from a broken workflow. The tradeoff is you take on the operational weight (infra, keys, the harness), so "sovereign" is great until you're maintaining all of it yourself.

The balance I aim for in Moonshift (a multi-agent pipeline: prompt to a shipped SaaS on your own GitHub + Vercel) is sovereignty over the output without the sovereignty tax on the plumbing - you own the resulting code and infra (your repo, your Vercel), but you don't hand-build the agent harness each time; routing also keeps a full build ~$3 flat instead of a hosted-agent subscription. First run's free, no card. Cool build - what pushed you to sovereign/self-hosted over a managed agent platform: cost, control, data sovereignty, or vendor-risk? Those motivations lead to pretty different architectures.