DEV Community

Koustubh
Koustubh

Posted on

gharasathi (घरासाठी) — A Privacy-First Household AI Running on a $200 Mini PC

gharasathi ("for home" in Marathi) is a privacy-first household AI that connects finances, photos, and memories — running entirely on a $200 mini PC in my garage. No cloud. No subscriptions. No data leaving the house.

The Problem

Household data is scattered everywhere. Bank transactions in three different apps. Photos split across iCloud and Google Photos. Bills in s or individual company portals. Memories in your head.

Every "smart" assistant that promises to unify this — Alexa, Google Home, ChatGPT — requires shipping your most intimate data to someone else's servers. Your spending patterns. Your family photos. Your location history. All flowing through infrastructure you don't control, governed by privacy policies that change quarterly.

I wanted something different: a private AI that ties all household data together and runs entirely on my home network. Something I could ask "How much did we spend during the Christmas trip?" and get an answer by traversing actual data, not hallucinating one. Something where the answer to "where is my data?" is always "in the living room."

Architecture

The system is a set of microservices on Kubernetes. All services are named in Marathi — the language spoken in western India.

diagram

Service Language Role
aapla-dhan (आपलं धन) Go Finance — syncs bank transactions, loans, investments
aaplya-athvani (आपल्या आठवणी) Go Memories — photo sync, tagging, storage
aapla-hushar (आपला हुशार) Python + LangGraph AI chat — intent routing, agents, Ollama sidecar
aapla-mahitisatha (आपला माहिती साठा) Neo4j Graph database for all structured data
gharasathi-ios Swift/SwiftUI Native iOS app
gharasathi-web React/TypeScript Browser interface

Everything runs on a single ByteNUC mini PC — 6GB RAM, 2TB disk, no GPU — running Talos Linux. The entire stack — OS, K8s, database, LLM, and all services — fits in that 6GB with CPU-only inference.

A core design principle: services write data to Neo4j, and the LLM reads from it. The AI layer never modifies your data — it only queries and explains. This separation means the LLM can be swapped, restarted, or upgraded without any risk to your actual records.

Why Neo4j?

Household data is inherently a graph. People own accounts. Accounts generate transactions. Transactions happen at places. Photos feature people at events. Events are held at places. Places contain other places.

diagram

The schema has 6 node types and 17 relationship types — from OWNS and SOURCE_OF to FEATURES, SIMILAR_TO, and LOCATED_IN. This density of connections is exactly what graph databases excel at.

Consider what "our Sydney trip" means in data terms. It's an Event node connected to: Transaction nodes (what we spent), Photo nodes (what we captured), Person nodes (who went), and Place nodes (where we went) — which themselves link upward via LOCATED_IN to "Sydney" to "NSW" to "Australia." In a relational database, that's a normalized nightmare. In a graph, it's just… the shape of the data.

The Query That Sells It

"How much did we spend during the Christmas trip?"

In a relational database, this requires JOINing across 4+ tables: events, transactions, places, and a trip-transaction mapping table. In Neo4j, it's a single traversal:

MATCH (e:Event)
WHERE e.title CONTAINS 'Christmas' AND e.startDate.year = 2025
OPTIONAL MATCH (e)<-[:PART_OF]-(photo:Photo)
OPTIONAL MATCH (e)<-[:RELATED_TO]-(t:Transaction)
OPTIONAL MATCH (e)<-[:ATTENDED]-(person:Person)
RETURN e.title,
  collect(DISTINCT person.name) as attendees,
  count(DISTINCT photo) as photos,
  sum(t.amount) as totalSpent
Enter fullscreen mode Exit fullscreen mode

One query. It starts at the Christmas event node, walks outward along relationships, and gathers everything connected: who attended, how many photos were taken, and the total cost. No JOINs. No subqueries. The relationships are the schema.

This pattern repeats across every use case:

  • "What are my top spending categories?" — aggregate Transaction nodes by category
  • "Show photos from the reef trip" — traverse Photo → PART_OF → Event
  • "Find photos similar to this one" — vector similarity search on embeddings
  • "What did we do last year?" — walk Events by date, gather connected Transactions and Photos
  • "Show spending by location" — traverse Transaction → OCCURRED_AT → Place

The query file has 25+ patterns covering financial analysis, photo search, people lookup, event timelines, and cross-domain insights. Every one of them follows the same shape: start at a node, walk relationships, aggregate what you find.

Vector Search Built In

Neo4j 5.11+ supports native vector indexes. Every Photo, Transaction, and Event node carries a 1536-dimension embedding vector. This enables semantic search without a separate vector database:

CALL db.index.vector.queryNodes('photo_embedding', 5, p.embedding)
YIELD node, score
RETURN node.filename, node.aiDescription, score
Enter fullscreen mode Exit fullscreen mode

"Find sunset photos" doesn't need exact keyword matching — it searches by meaning. No separate Pinecone or Weaviate instance needed — one database handles both structured queries and semantic search.

Dual Storage: Graph + Object

One thing Neo4j shouldn't store is binary files. Photos and videos go to MinIO — a self-hosted, S3-compatible object store. Neo4j holds the metadata (who's in the photo, where it was taken, AI-generated description, embedding vector) while MinIO holds the actual JPEG. The Photo node's storagePath property links the two.

This keeps Neo4j lean — critical when you're running it in 1GB of RAM — while MinIO happily stores terabytes of photos on the 2TB disk.

What's Next

The architecture makes sense on paper. But how do you run Neo4j + an LLM + 4 microservices on a machine with only 6GB RAM?

In Part 2, I cover the model selection journey — where research recommended one model, reality disagreed, and I had to learn the hard way that benchmarks don't mean much on constrained hardware.


This is Part 1 of a 3-part series on building a privacy-first household AI. Use the series navigation above to read Part 2 (LLM Model Selection) and Part 3 (Privacy & Lessons from OpenClaw).

Top comments (0)