gharasathi ("for home" in Marathi) is a privacy-first household AI that connects finances, photos, and memories — running entirely on a $200 mini PC in my garage. No cloud. No subscriptions. No data leaving the house.
The Problem
Household data is scattered everywhere. Bank transactions in three different apps. Photos split across iCloud and Google Photos. Bills in s or individual company portals. Memories in your head.
Every "smart" assistant that promises to unify this — Alexa, Google Home, ChatGPT — requires shipping your most intimate data to someone else's servers. Your spending patterns. Your family photos. Your location history. All flowing through infrastructure you don't control, governed by privacy policies that change quarterly.
I wanted something different: a private AI that ties all household data together and runs entirely on my home network. Something I could ask "How much did we spend during the Christmas trip?" and get an answer by traversing actual data, not hallucinating one. Something where the answer to "where is my data?" is always "in the living room."
Architecture
The system is a set of microservices on Kubernetes. All services are named in Marathi — the language spoken in western India.
| Service | Language | Role |
|---|---|---|
| aapla-dhan (आपलं धन) | Go | Finance — syncs bank transactions, loans, investments |
| aaplya-athvani (आपल्या आठवणी) | Go | Memories — photo sync, tagging, storage |
| aapla-hushar (आपला हुशार) | Python + LangGraph | AI chat — intent routing, agents, Ollama sidecar |
| aapla-mahitisatha (आपला माहिती साठा) | Neo4j | Graph database for all structured data |
| gharasathi-ios | Swift/SwiftUI | Native iOS app |
| gharasathi-web | React/TypeScript | Browser interface |
Everything runs on a single ByteNUC mini PC — 6GB RAM, 2TB disk, no GPU — running Talos Linux. The entire stack — OS, K8s, database, LLM, and all services — fits in that 6GB with CPU-only inference.
A core design principle: services write data to Neo4j, and the LLM reads from it. The AI layer never modifies your data — it only queries and explains. This separation means the LLM can be swapped, restarted, or upgraded without any risk to your actual records.
Why Neo4j?
Household data is inherently a graph. People own accounts. Accounts generate transactions. Transactions happen at places. Photos feature people at events. Events are held at places. Places contain other places.
The schema has 6 node types and 17 relationship types — from OWNS and SOURCE_OF to FEATURES, SIMILAR_TO, and LOCATED_IN. This density of connections is exactly what graph databases excel at.
Consider what "our Sydney trip" means in data terms. It's an Event node connected to: Transaction nodes (what we spent), Photo nodes (what we captured), Person nodes (who went), and Place nodes (where we went) — which themselves link upward via LOCATED_IN to "Sydney" to "NSW" to "Australia." In a relational database, that's a normalized nightmare. In a graph, it's just… the shape of the data.
The Query That Sells It
"How much did we spend during the Christmas trip?"
In a relational database, this requires JOINing across 4+ tables: events, transactions, places, and a trip-transaction mapping table. In Neo4j, it's a single traversal:
MATCH (e:Event)
WHERE e.title CONTAINS 'Christmas' AND e.startDate.year = 2025
OPTIONAL MATCH (e)<-[:PART_OF]-(photo:Photo)
OPTIONAL MATCH (e)<-[:RELATED_TO]-(t:Transaction)
OPTIONAL MATCH (e)<-[:ATTENDED]-(person:Person)
RETURN e.title,
collect(DISTINCT person.name) as attendees,
count(DISTINCT photo) as photos,
sum(t.amount) as totalSpent
One query. It starts at the Christmas event node, walks outward along relationships, and gathers everything connected: who attended, how many photos were taken, and the total cost. No JOINs. No subqueries. The relationships are the schema.
This pattern repeats across every use case:
- "What are my top spending categories?" — aggregate Transaction nodes by category
-
"Show photos from the reef trip" — traverse
Photo → PART_OF → Event - "Find photos similar to this one" — vector similarity search on embeddings
- "What did we do last year?" — walk Events by date, gather connected Transactions and Photos
-
"Show spending by location" — traverse
Transaction → OCCURRED_AT → Place
The query file has 25+ patterns covering financial analysis, photo search, people lookup, event timelines, and cross-domain insights. Every one of them follows the same shape: start at a node, walk relationships, aggregate what you find.
Vector Search Built In
Neo4j 5.11+ supports native vector indexes. Every Photo, Transaction, and Event node carries a 1536-dimension embedding vector. This enables semantic search without a separate vector database:
CALL db.index.vector.queryNodes('photo_embedding', 5, p.embedding)
YIELD node, score
RETURN node.filename, node.aiDescription, score
"Find sunset photos" doesn't need exact keyword matching — it searches by meaning. No separate Pinecone or Weaviate instance needed — one database handles both structured queries and semantic search.
Dual Storage: Graph + Object
One thing Neo4j shouldn't store is binary files. Photos and videos go to MinIO — a self-hosted, S3-compatible object store. Neo4j holds the metadata (who's in the photo, where it was taken, AI-generated description, embedding vector) while MinIO holds the actual JPEG. The Photo node's storagePath property links the two.
This keeps Neo4j lean — critical when you're running it in 1GB of RAM — while MinIO happily stores terabytes of photos on the 2TB disk.
What's Next
The architecture makes sense on paper. But how do you run Neo4j + an LLM + 4 microservices on a machine with only 6GB RAM?
In Part 2, I cover the model selection journey — where research recommended one model, reality disagreed, and I had to learn the hard way that benchmarks don't mean much on constrained hardware.
This is Part 1 of a 3-part series on building a privacy-first household AI. Use the series navigation above to read Part 2 (LLM Model Selection) and Part 3 (Privacy & Lessons from OpenClaw).
Top comments (0)