This article was originally published on aifoss.dev
TL;DR: Three services, one Docker Compose file, zero data leaving your machine. This guide connects Ollama for inference, Open WebUI for the chat frontend, and PostgreSQL with pgvector for document embeddings. The trade-off vs. the simpler two-container setup: more initial config, but persistent RAG data, multi-process safety, and one database to back up.
What you'll have running after this guide:
- Ollama 0.30.x serving LLMs locally (Qwen2.5, Llama 3.3, Mistral, or any model in the library)
- Open WebUI 0.9.6 with knowledge bases backed by pgvector — all retrieval stays on-device
- PostgreSQL 17 + pgvector 0.8.2 storing embeddings persistently, safe for multi-worker Open WebUI deployments
Honest take: If your documents are sensitive enough that they can't touch OpenAI or Anthropic's APIs, this stack is the right call. If you just want local chat with no document search, the basic two-container setup is simpler — stop reading here and follow the Ollama + Open WebUI Linux setup guide instead.
All three tools are open source and free to self-host: Ollama and Open WebUI are MIT licensed; pgvector is PostgreSQL-licensed (BSD-equivalent). No usage limits, no call-home telemetry, no per-query fees.
Why swap the default vector database?
Open WebUI ships with ChromaDB as its vector store. It works for a single user on a single machine. The problem shows up when:
- You run Open WebUI with multiple uvicorn workers — ChromaDB's PersistentClient uses SQLite under the hood, which isn't fork-safe. Workers inherit the same database connection and corrupt each other's state under concurrent writes.
- You restart the container and lose RAG context because the Chroma data volume wasn't correctly mounted.
- You want a single backup to cover everything — chat history, user accounts, and document embeddings — instead of backing up Chroma separately.
Switching to pgvector fixes all three. The extension runs inside the same PostgreSQL instance Open WebUI already needs for its application database. One service, one backup, no extra containers.
For a deeper look at how pgvector compares to Qdrant and ChromaDB at scale, see the vector database comparison.
Hardware floor
| Setup | RAM | GPU | What runs |
|---|---|---|---|
| Minimum (CPU only) | 16 GB | None | 7B Q4_K_M at 4–8 tok/s; RAG adds 3–5s retrieval |
| Comfortable | 16 GB | RTX 3060 12GB | 7B at 28–35 tok/s; 13B at 15–22 tok/s |
| Recommended | 32 GB | RTX 4070 12GB | 14B at 40–50 tok/s; 32B Q4 at 18–25 tok/s |
| Heavy RAG / 70B | 64 GB | RTX 4090 24GB | 70B Q4_K_M at 20–30 tok/s with fast embedding |
The embedding model (nomic-embed-text, 274MB) runs alongside your inference model. On an 8GB VRAM card, both compete for VRAM and you'll see the inference model partially offloaded to CPU. 12GB+ keeps both fully on-GPU.
CPU-only setups work — expect 10–30s per response instead of 1–3s. If you occasionally need GPU scale for large document batches, RunPod rents A5000s (24GB VRAM) for under $0.30/hr without a long-term commitment.
For hardware build recommendations to pair with this stack, see the GPU server guides on runaihome.com.
Architecture
┌──────────────────────────────────────────────┐
│ Docker bridge network │
│ │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ Ollama │◄───│ Open WebUI │ │
│ │ :11434 │ │ :8080 │ │
│ └──────────────┘ └─────────┬──────────┘ │
│ │ │
│ ┌─────────────▼───────────┐ │
│ │ PostgreSQL 17 │ │
│ │ + pgvector 0.8.2 │ │
│ │ :5432 │ │
│ └─────────────────────────┘ │
└──────────────────────────────────────────────┘
Open WebUI talks to Ollama for inference and to PostgreSQL for two things: its own application data (users, sessions, settings) and the RAG vector store (embeddings). PostgreSQL handles both roles — no separate Chroma service, no additional volume to manage.
Step 1: Write the Docker Compose file
mkdir ai-stack && cd ai-stack
nano compose.yaml
Paste the following:
services:
postgres:
image: pgvector/pgvector:pg17
restart: unless-stopped
environment:
POSTGRES_DB: openwebui
POSTGRES_USER: openwebui
POSTGRES_PASSWORD: changeme_strong_password
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U openwebui -d openwebui"]
interval: 10s
timeout: 5s
retries: 5
ollama:
image: ollama/ollama:latest
restart: unless-stopped
ports:
- "127.0.0.1:11434:11434"
volumes:
- ollama_data:/root/.ollama
# Uncomment for NVIDIA GPU:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: all
# capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
restart: unless-stopped
ports:
- "3000:8080"
depends_on:
postgres:
condition: service_healthy
environment:
OLLAMA_BASE_URL: http://ollama:11434
DATABASE_URL: postgresql://openwebui:changeme_strong_password@postgres:5432/openwebui
PGVECTOR_DB_URL: postgresql://openwebui:changeme_strong_password@postgres:5432/openwebui
VECTOR_DB: pgvector
RAG_EMBEDDING_ENGINE: ollama
RAG_EMBEDDING_MODEL: nomic-embed-text
volumes:
- open_webui_data:/app/backend/data
volumes:
postgres_data:
ollama_data:
open_webui_data:
Three things worth calling out before you run it:
pgvector/pgvector:pg17 ships with the vector extension pre-installed. You don't need to run CREATE EXTENSION vector manually — Open WebUI runs that migration on first boot.
Ollama is bound to 127.0.0.1:11434 — accessible to other containers on the Docker network but not exposed to your LAN. This matters: unauthenticated Ollama instances have shown up in security research repeatedly. If you need LAN access, use a reverse proxy with auth rather than exposing port 11434 directly. See the Ollama security guide for the full explanation.
Change changeme_strong_password in all three places it appears (POSTGRES_PASSWORD, DATABASE_URL, PGVECTOR_DB_URL) before running. Use the same value in all three.
Step 2: Start the stack and pull models
docker compose up -d
Docker pulls the three images (roughly 2.5GB total on first run), then starts the services. After 30–60 seconds:
✔ Container ai-stack-postgres-1 Healthy
✔ Container ai-stack-ollama-1 Started
✔ Container ai-stack-open-webui-1 Started
The service_healthy condition in the compose file makes Open WebUI wait for PostgreSQL to accept connections before starting. If you skip the healthcheck and start all three simultaneously, you'll see Open WebUI crash-loop for 15–20 seconds while Postgres initializes — not a real problem, but noisy.
Now pull the inference and embedding models:
# Inference model — swap for any model that fits your VRAM
docker exec ai-stack-ollama-1 ollama pull qwen2.5:7b
# Embedding model Open WebUI will use for RAG
docker exec ai-stack-ollama-1 ollama pull nomic-embed-text
Why nomic-embed-text? It's 274MB, produces 768-dimensional vectors, and scores well on MTEB English retrieval benchmarks. For multilingual documents, mxbai-embed-large (670MB, 1024-dim) outperforms it. For minimal footprint, all-minilm (46MB) works but recall quality
Top comments (0)