Building a Sovereign AI Stack: From Zero to POC

#ai #privacy #selfhosted #architecture

In an era where data privacy is paramount, relying on cloud-based AI providers isn't always an option. Whether for compliance, security, or just peace of mind, running a Sovereign AI Stack — a completely local, self-controlled AI infrastructure — is the ultimate goal for many organizations.

Today, we built a Proof of Concept (POC) for such a stack, leveraging open-source tools to create a private, observable, and searchable AI environment. Here is our journey.

The Architecture

Our stack consists of three core components, orchestrated by a Node.js application:

AI Server: A local LLM running on llama.cpp (serving an OpenAI-compatible API). This provides the intelligence without data leaving the network.
Search Engine: Manticore Search (running in Docker). We chose Manticore for its lightweight footprint and powerful full-text search capabilities, essential for RAG (Retrieval-Augmented Generation).
Observability: AI Observer (running in Docker). You can't manage what you can't measure. This tool captures traces and metrics of our AI interactions.

The Architecture Visualized

┌─────────────────────────────────────────────────────┐
│                  Sovereign AI Stack                  │
├─────────────────────────────────────────────────────┤
│                                                     │
│   ┌───────────────────┐                             │
│   │   Orchestrator    │                             │
│   │   (Node.js/TS)    │                             │
│   └──┬──────┬─────┬───┘                             │
│      │      │     │                                 │
│      │1     │2    │3                                │
│      ▼      ▼     ▼                                 │
│   ┌──────┐ ┌──────────┐ ┌──────────────┐           │
│   │Search│ │ AI Server│ │ AI Observer   │           │
│   │Engine│ │ llama.cpp│ │ (Telemetry)  │           │
│   └──────┘ └──────────┘ └──────────────┘           │
│   Manticore  192.168.x   Ports: 4318    │           │
│   :9308/:9312  :8080     (OTLP) + 8080  │           │
│                          (Dashboard)     │           │
└─────────────────────────────────────────────────────┘

  1 = Search Context    2 = Send Prompt + Context
  3 = Log Telemetry

Component State Flow

  [Init] ──► [Indexing] ──► [Searching] ──► [RAG Construction] ──► [Inference]
                              │                                       │
                              ▼                                       ▼
                           [Error]                              [Success] or
                          (No Hits /                             [Timeout]
                           Retry)                            (Model Slow)

The flow is straightforward: we create a real-time (RT) index in Manticore, add documents, search for relevant context, construct a RAG prompt, and send it to the local LLM for inference.

The Implementation

1. Setting the Foundation (Docker)

We containerized Manticore and AI Observer using docker-compose. One interaction challenge was networking: ensuring our orchestrator (client) could talk to the containers AND the external AI server. Mapping ports (9308, 9312 for Manticore, 4318 and 8080 for AI Observer) was crucial.

Lesson learned: Manticore's SQL interface over HTTP (/sql) is powerful but slightly different from the JSON-only /search endpoint typically used by some clients. We had to adapt our client to parse the SQL response structure properly.

2. The Orchestrator

We built a simple TypeScript orchestrator that mimics a real-world application flow:

  User ──► Orchestrator ──► Manticore (Index Data)
                │
                ├──► Manticore (Search: "Ensures data privacy")
                │         │
                │         ◄── Returns Context Documents
                │
                ├──► LLM (Prompt + Context)
                │         │
                │         ◄── Generated Answer
                │
                ├──► AI Observer (Log Telemetry)
                │
                ◄── Display Result to User

The pipeline follows the classic RAG pattern:

Ingest: Index sovereign data into Manticore.
Retrieve: Search Manticore for relevant context (MATCH('Ensures data privacy')).
Augment: Combine the retrieved context with a user prompt.
Generate: Send the augmented prompt to the local LLM.
Observe: Log every step to AI Observer.

3. Verification & Testing

We didn't just build it; we proved it works.

Integration Tests: Using vitest, we verified that documents are indexed correctly and retrievable (fixing a zero-hit issue by understanding RT index flushing).
End-to-End: The full pipeline generated a coherent explanation of "Sovereign AI" using our local setup.
Visual Validation: We verified the AI Observer UI via browser automation to ensure telemetry was landing.

Real-World Experience

The most striking realization was the latency trade-off. Our local LLM took ~18-80 seconds for a comprehensive answer on CPU. While slower than cloud APIs, the trade-off buys you total privacy. No token costs, no data leaks.

Manticore proved to be incredibly fast for retrieval, often returning hits in milliseconds, making it a perfect companion for the slower LLM.

Key Takeaways

Aspect	Cloud AI	Sovereign AI Stack
Privacy	Data leaves your network	Data stays local
Cost	Per-token pricing	One-time hardware investment
Latency	~1-5 seconds	~18-80 seconds (CPU)
Compliance	Depends on provider	Full GDPR control
Customization	Limited	Complete

What's Next

This POC proves that a Sovereign AI Stack is not only possible but accessible. With tools like Manticore and AI Observer, you can build a robust, private RAG pipeline in an afternoon.

Next steps for production:

Implement a persistent vector store for semantic search
Optimize LLM inference speed (quantization, GPU offloading)
Build a chat UI on top of the orchestrator
Add document ingestion pipeline (PDF, DOCX, web scraping)

I'm Jane Alesi, Lead AI Architect at satware AG in Worms, Germany. I build enterprise AI systems with a focus on data sovereignty, GDPR compliance, and the saTway methodology — where technical excellence meets human empathy.

Follow me on GitHub | dev.to | LinkedIn | All links