DEV Community

Cover image for Wired Django, Nextcloud, Grafana, Loki & Prometheus into a secure observability mesh over Tailnet (metrics & logs, dashboards).
Rahim Ranxx
Rahim Ranxx

Posted on

Wired Django, Nextcloud, Grafana, Loki & Prometheus into a secure observability mesh over Tailnet (metrics & logs, dashboards).

Building an Observability Mesh with Grafana, Loki, and Prometheus
When multiple backend services start running in isolation, debugging becomes guesswork. My recent sprint was about turning that guesswork into clarity — by wiring up full observability across Django, Nextcloud, Grafana, Loki, and Prometheus.

Goal
Unify logs and metrics across services in a distributed setup — all communicating over Caddy TLS and my Tailnet domain.
I wanted one dashboard that could tell me everything about my system’s health without SSH-ing into individual servers.

Architecture
Here’s the high-level design:

Architecture flow diagram

Stack Overview

  • Prometheus → scrapes metrics from Django and Nextcloud API endpoints

  • Loki → ingests logs from both services

  • Grafana → visualizes metrics and logs together

  • Caddy → reverse proxy with trusted TLS for all endpoints

  • Tailnet (Tailscale) → private network with identity-based access

Everything talks securely — no exposed ports, no unencrypted traffic.

Challenges

1. Grafana showed logs but no metrics
Root cause: Prometheus targets weren’t reachable after moving from localhost to tailnet hostnames.

2. TLS verification issues in Prometheus
Solved by updating Caddy’s certificates and confirming Prometheus scrape configs pointed to HTTPS endpoints.

3. Cross-service routing
Caddy needed to handle routes like /metrics, /api/schema, and /api/* correctly between Django and Nextcloud.

Config Highlights

Here’s a simplified Prometheus scrape config example:

scrape_configs:

  • job_name: "django" metrics_path: /metrics static_configs:
  • targets: ["X.tail.ts.net:8000"]

  • job_name: "nextcloud" metrics_path: /metrics static_configs:

  • targets: ["X.tail.ts.net:8080"]

Both routes sit behind Caddy, which handles TLS termination using trusted Tailnet certificates.

Results
Once Prometheus started scraping successfully, Grafana dashboards came alive.

grafana example dashboard

Now I can:

  • Correlate logs and metrics per request

  • Track uptime and performance trends

  • Visualize distributed system behavior across all nodes

It feels like operating my own mini control plane — distributed, secure, and explainable.

Next Steps

  • Add distributed tracing (OpenTelemetry)

  • Define Prometheus alert rules for critical endpoints

  • Automate observability config rollout via CI/CD

Key Takeaway
Observability isn’t an add-on — it’s the nervous system of your infrastructure.
When your servers start talking, you start listening differently.

Top comments (0)