I’ve been obsessed with the Quantified Self movement for over five years. Between my Garmin Fenix for trail running and my Oura Ring for sleep tracking, I have accumulated a mountain of health data. But here’s the problem: Wearable data is a hot mess. 💩
Garmin gives you granular FIT files or messy CSVs, and Oura provides JSON exports with completely different timestamp formats and missing values. If you want to answer a simple question like "How does my training load affect my deep sleep over time?", you're stuck in Excel hell.
In this tutorial, we are going to build a high-performance Personal health OLAP engine using DuckDB, Python, and Evidence.dev. We’ll perform Data Engineering at the edge, turning 5 years of "dirty" data into a clean, queryable goldmine.
Pro Tip: If you're looking for more production-ready data patterns or advanced analytics architectures, definitely check out the deep dives over at the WellAlly Blog.
🏗 The Architecture
We need a system that is fast, local, and requires zero infrastructure overhead. DuckDB is the perfect "in-process" OLAP database for this, and Evidence.dev allows us to build beautiful dashboards using just SQL and Markdown.
graph TD
A[Oura Ring JSON] -->|Python Script| B(DuckDB Raw Layer)
C[Garmin CSVs] -->|DuckDB Secret Magic| B
B --> D{Data Transformation}
D -->|SQL Cleaning| E[Cleaned Parquet Files]
E --> F[Evidence.dev Dashboard]
F --> G[Health Insights 📈]
🛠 Prerequisites
Before we dive in, make sure you have the following installed:
- Python 3.10+
-
DuckDB:
pip install duckdb - Evidence.dev: A static-site generator for data (Node.js based)
Step 1: Ingesting the "Dirty" Data
DuckDB's superpower is its ability to query JSON and CSV files directly without a formal "Import" step. Here is how we handle the Oura JSON and Garmin CSVs simultaneously.
import duckdb
# Initialize our DuckDB local instance
con = duckdb.connect('health_data.db')
# 1. Ingest Oura JSON (Handling nested structures)
con.execute("""
CREATE OR REPLACE TABLE raw_oura AS
SELECT * FROM read_json_auto('data/oura_export/*.json');
""")
# 2. Ingest Garmin CSV (Handling inconsistent headers)
con.execute("""
CREATE OR REPLACE TABLE raw_garmin AS
SELECT * FROM read_csv_auto('data/garmin_export/*.csv', header=True);
""")
print("✅ Data successfully ingested into the staging area!")
Step 2: Cleaning & Normalizing (The SQL Magic)
The Oura data uses ISO timestamps, while Garmin might use localized strings. We need a unified daily_metrics table. We'll use DuckDB’s SELECT * EXCLUDE and UNION capabilities.
-- Create a unified view of health metrics
CREATE OR REPLACE TABLE analytics_heart_rate AS
SELECT
strptime(summary_date, '%Y-%m-%d')::DATE as report_date,
'oura' as source,
average_hr as avg_bpm,
rmssd as hrv_ms
FROM raw_oura
WHERE average_hr IS NOT NULL
UNION ALL
SELECT
"Timestamp"::DATE as report_date,
'garmin' as source,
"Average Heart Rate" as avg_bpm,
NULL as hrv_ms -- Garmin export might not include HRV in this specific CSV
FROM raw_garmin;
Step 3: Visualizing with Evidence.dev
Now that we have a clean health_data.db, we can use Evidence.dev to build a dashboard. Evidence is a game-changer because it treats SQL as a first-class citizen in Markdown.
In your index.md for Evidence:
# 🏃♂️ Personal Health Dashboard
Using DuckDB to process 5 years of biometric trends.
sql heart_rate_trends
select
report_date,
avg(avg_bpm) as rolling_avg_bpm
from analytics_heart_rate
group by 1
order by 1
<LineChart
data={heart_rate_trends}
x=report_date
y=rolling_avg_bpm
title="Resting Heart Rate Over Time"
/>
🥑 Why this Stack?
- DuckDB is insanely fast: It can process millions of rows of wearable data in milliseconds on your laptop.
- Zero-Ops: No Docker, no Postgres, no cloud bills. Just a
.dbfile. - SQL-centric: You don't need to learn complex BI tools. If you know SQL, you can build a full-stack data app.
For those of you looking to scale these patterns to enterprise-level data platforms—perhaps moving from local DuckDB to a cloud-based Snowflake or BigQuery environment—the team at WellAlly.tech/blog has some incredible resources on Data Engineering best practices and scalable ETL pipelines.
Conclusion
Quantifying yourself shouldn't be a chore. By using DuckDB for the heavy lifting and Evidence.dev for the presentation, you can turn years of messy health data into actionable insights in a single afternoon.
What are you tracking? Whether it's sleep, steps, or even your coffee intake, try throwing it into a DuckDB instance and see what patterns emerge! 🚀
Feel free to drop a comment below if you have questions about the SQL transformations or need help setting up Evidence.dev! 👇
Top comments (0)