Quick story.
I run a small homelab — one box, an NVIDIA card, around ten Docker containers, and a couple of local model servers (Ollama mostly, vLLM when I'm playing around).
Every "why is this model OOM-ing" turned into the same five minutes of archaeology:
nvidia-smi → pick a PID
ps -o cgroup -p → find the container ID
docker ps → map ID to name
Just to answer: which container, which model, is eating my VRAM right now?
I tried Prometheus + Grafana + node-exporter + dcgm-exporter. It works, but for one box it's a stack-on-a-stack to answer a single question.
So I built a third option: one container, one page. GPU panel maps VRAM-using processes back to their Docker container automatically. AI Models panel queries each model server's own API (Ollama /api/ps, vLLM /v1/models, llama.cpp, TGI, A1111, ComfyUI) and shows you which model is loaded.
docker compose up -d --build and that's the whole setup.
History in SQLite, downsampled on read. No agents, no cloud, no Prometheus.
The repo, with the longer technical write-up and screenshots:
👉 github.com/SikamikanikoBG/homelab-monitor
MIT licensed. NVIDIA-only on the GPU panel for now — AMD/Intel back-ends are a good first issue if anyone wants to extend.
Curious how others here solve the "who holds my VRAM" problem. Different tool? Different stack? Or did you also build something tiny because the big stacks felt like too much for one box?
Top comments (2)
Oh man the nvidia-smi to ps to docker ps pipeline is painfully familiar. I do that exact same thing probably 5 times a week on my Ollama box and it never gets less annoying.
The fact that this is one container with SQLite instead of a full Prometheus/Grafana stack is what makes it actually useable for a single machine setup. I tried the grafana route once and spent more time configuring dashboards than I did actualy doing the useful stuff. Does it handle the case where Ollama has a model loaded but idle (still holding VRAM but not generating)? Thats usually the sneaky one that eats your headroom without showing up in utilization. 😶
Yes, it shows you if the model is idle and split of the vram per models. Very handy to fully understand what is going on under the hood :)