In an era where data privacy is paramount, relying on cloud-based AI providers isn't always an option. Whether for compliance, security, or just peace of mind, running a Sovereign AI Stack — a completely local, self-controlled AI infrastructure — is the ultimate goal for many organizations.
Today, we built a Proof of Concept (POC) for such a stack, leveraging open-source tools to create a private, observable, and searchable AI environment. Here is our journey.
The Architecture
Our stack consists of three core components, orchestrated by a Node.js application:
- AI Server: A local LLM running on llama.cpp (serving an OpenAI-compatible API). This provides the intelligence without data leaving the network.
- Search Engine: Manticore Search (running in Docker). We chose Manticore for its lightweight footprint and powerful full-text search capabilities, essential for RAG (Retrieval-Augmented Generation).
- Observability: AI Observer (running in Docker). You can't manage what you can't measure. This tool captures traces and metrics of our AI interactions.
The Architecture Visualized
┌─────────────────────────────────────────────────────┐
│ Sovereign AI Stack │
├─────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────┐ │
│ │ Orchestrator │ │
│ │ (Node.js/TS) │ │
│ └──┬──────┬─────┬───┘ │
│ │ │ │ │
│ │1 │2 │3 │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────────┐ ┌──────────────┐ │
│ │Search│ │ AI Server│ │ AI Observer │ │
│ │Engine│ │ llama.cpp│ │ (Telemetry) │ │
│ └──────┘ └──────────┘ └──────────────┘ │
│ Manticore 192.168.x Ports: 4318 │ │
│ :9308/:9312 :8080 (OTLP) + 8080 │ │
│ (Dashboard) │ │
└─────────────────────────────────────────────────────┘
1 = Search Context 2 = Send Prompt + Context
3 = Log Telemetry
Component State Flow
[Init] ──► [Indexing] ──► [Searching] ──► [RAG Construction] ──► [Inference]
│ │
▼ ▼
[Error] [Success] or
(No Hits / [Timeout]
Retry) (Model Slow)
The flow is straightforward: we create a real-time (RT) index in Manticore, add documents, search for relevant context, construct a RAG prompt, and send it to the local LLM for inference.
The Implementation
1. Setting the Foundation (Docker)
We containerized Manticore and AI Observer using docker-compose. One interaction challenge was networking: ensuring our orchestrator (client) could talk to the containers AND the external AI server. Mapping ports (9308, 9312 for Manticore, 4318 and 8080 for AI Observer) was crucial.
Lesson learned: Manticore's SQL interface over HTTP (/sql) is powerful but slightly different from the JSON-only /search endpoint typically used by some clients. We had to adapt our client to parse the SQL response structure properly.
2. The Orchestrator
We built a simple TypeScript orchestrator that mimics a real-world application flow:
User ──► Orchestrator ──► Manticore (Index Data)
│
├──► Manticore (Search: "Ensures data privacy")
│ │
│ ◄── Returns Context Documents
│
├──► LLM (Prompt + Context)
│ │
│ ◄── Generated Answer
│
├──► AI Observer (Log Telemetry)
│
◄── Display Result to User
The pipeline follows the classic RAG pattern:
- Ingest: Index sovereign data into Manticore.
-
Retrieve: Search Manticore for relevant context (
MATCH('Ensures data privacy')). - Augment: Combine the retrieved context with a user prompt.
- Generate: Send the augmented prompt to the local LLM.
- Observe: Log every step to AI Observer.
3. Verification & Testing
We didn't just build it; we proved it works.
-
Integration Tests: Using
vitest, we verified that documents are indexed correctly and retrievable (fixing a zero-hit issue by understanding RT index flushing). - End-to-End: The full pipeline generated a coherent explanation of "Sovereign AI" using our local setup.
- Visual Validation: We verified the AI Observer UI via browser automation to ensure telemetry was landing.
Real-World Experience
The most striking realization was the latency trade-off. Our local LLM took ~18-80 seconds for a comprehensive answer on CPU. While slower than cloud APIs, the trade-off buys you total privacy. No token costs, no data leaks.
Manticore proved to be incredibly fast for retrieval, often returning hits in milliseconds, making it a perfect companion for the slower LLM.
Key Takeaways
| Aspect | Cloud AI | Sovereign AI Stack |
|---|---|---|
| Privacy | Data leaves your network | Data stays local |
| Cost | Per-token pricing | One-time hardware investment |
| Latency | ~1-5 seconds | ~18-80 seconds (CPU) |
| Compliance | Depends on provider | Full GDPR control |
| Customization | Limited | Complete |
What's Next
This POC proves that a Sovereign AI Stack is not only possible but accessible. With tools like Manticore and AI Observer, you can build a robust, private RAG pipeline in an afternoon.
Next steps for production:
- Implement a persistent vector store for semantic search
- Optimize LLM inference speed (quantization, GPU offloading)
- Build a chat UI on top of the orchestrator
- Add document ingestion pipeline (PDF, DOCX, web scraping)
I'm Jane Alesi, Lead AI Architect at satware AG in Worms, Germany. I build enterprise AI systems with a focus on data sovereignty, GDPR compliance, and the saTway methodology — where technical excellence meets human empathy.
Top comments (0)