Polliog

Posted on Feb 16

TimescaleDB Outperforms MongoDB for JSON Logs (100M Document Benchmark)

#postgres #mongodb #database #performance

When building Logtide, we needed to store millions of JSON log documents. The "obvious" choice seemed to be MongoDB — it's a document database designed for JSON, right?

We chose TimescaleDB instead. Here's why.

The "Boring Tech" Philosophy

We follow a simple rule: use proven, stable technology unless you have a specific reason not to.

TimescaleDB is PostgreSQL (battle-tested for 25+ years) with automatic time-series optimizations. It's reliable, well-documented, and deploys anywhere PostgreSQL runs.

Adding MongoDB would mean:

Another database to manage
Another skill set for the team
Another backup strategy
Another monitoring system

We wanted to keep our stack simple. TimescaleDB. That's it.

But we needed proof it could handle our workload.

The Benchmark Setup

I loaded 100 million JSON log documents into both databases to compare:

Write throughput (logs/second)
Query performance (complex filters)
Storage efficiency (compression)
Resource usage (RAM, CPU)

Test Data Schema

Real production log structure:

{
  "timestamp": "2026-02-16T10:30:45.123Z",
  "level": "error",
  "service": "api-gateway",
  "message": "Database connection timeout",
  "metadata": {
    "user_id": "usr_abc123",
    "request_id": "req_xyz789",
    "duration_ms": 5234,
    "endpoint": "/api/v1/users",
    "status_code": 500
  },
  "tags": ["production", "critical"],
  "context": {
    "environment": "prod-eu-west-1",
    "version": "v2.3.1"
  }
}

TimescaleDB Schema (PostgreSQL Extension)

-- Create hypertable (automatic time-based partitioning)
CREATE TABLE logs (
  id BIGSERIAL,
  time TIMESTAMPTZ NOT NULL,
  level TEXT NOT NULL,
  service TEXT NOT NULL,
  message TEXT,
  metadata JSONB,  -- JSON documents
  tags TEXT[],
  context JSONB,
  project_id UUID NOT NULL
);

-- Convert to hypertable (TimescaleDB magic)
SELECT create_hypertable('logs', 'time', chunk_time_interval => INTERVAL '1 day');

-- GIN indexes for JSONB queries
CREATE INDEX idx_logs_metadata_gin ON logs USING GIN (metadata);
CREATE INDEX idx_logs_context_gin ON logs USING GIN (context);

-- Composite index for common filters
CREATE INDEX idx_logs_project_service_time 
  ON logs (project_id, service, time DESC);

-- Automatic compression (90% space reduction)
ALTER TABLE logs SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'project_id, service',
  timescaledb.compress_orderby = 'time DESC'
);

-- Compress data older than 1 day
SELECT add_compression_policy('logs', INTERVAL '1 day');

Key differences from plain PostgreSQL:

Hypertables: Automatic partitioning by time (daily chunks)
Compression: Columnar compression (like Parquet) on old data
Chunk management: Drop old chunks instantly (vs slow DELETE queries)

MongoDB Schema

db.createCollection("logs", {
  validator: {
    $jsonSchema: {
      required: ["timestamp", "level", "service"],
      properties: {
        timestamp: { bsonType: "date" },
        level: { bsonType: "string" },
        service: { bsonType: "string" },
        message: { bsonType: "string" },
        metadata: { bsonType: "object" },
        tags: { bsonType: "array" },
        context: { bsonType: "object" }
      }
    }
  }
});

// Indexes
db.logs.createIndex({ "timestamp": -1 });
db.logs.createIndex({ "service": 1, "timestamp": -1 });
db.logs.createIndex({ "metadata.user_id": 1 });

Benchmark Results

1. Write Throughput

Test: Insert 100M logs as fast as possible, batched (1000 logs per batch).

Database	Logs/Second	Time to 100M
TimescaleDB	18,500	1h 30m
MongoDB	14,200	1h 57m

TimescaleDB wins by 30% on writes.

Why? Hypertable chunking. TimescaleDB automatically partitions data into daily chunks. Inserts hit smaller, more manageable tables instead of one giant table. MongoDB's WiredTiger engine is fast, but TimescaleDB's PostgreSQL COPY protocol for bulk inserts + chunking strategy is brutal.

Example: Inserting logs for February 16, 2026 only touches the _hyper_1_chunk_47 chunk, not the entire 100M row table.

2. Query Performance

Test 1: Simple filter (last 24 hours, service = "api-gateway")

-- TimescaleDB
SELECT * FROM logs 
WHERE time > NOW() - INTERVAL '24 hours'
  AND service = 'api-gateway'
ORDER BY time DESC 
LIMIT 100;

// MongoDB
db.logs.find({
  timestamp: { $gte: new Date(Date.now() - 86400000) },
  service: "api-gateway"
}).sort({ timestamp: -1 }).limit(100);

Database	P50 Latency	P95 Latency	P99 Latency
TimescaleDB	42ms	68ms	95ms
MongoDB	58ms	103ms	187ms

TimescaleDB's chunk exclusion is the secret. The query planner knows "last 24 hours" = only scan yesterday's and today's chunks (2 chunks), not all 100 chunks. MongoDB scans indexes across the entire collection.

Test 2: Complex JSONB query (nested field filter)

-- TimescaleDB
SELECT * FROM logs 
WHERE metadata->>'user_id' = 'usr_abc123'
  AND (metadata->>'duration_ms')::int > 5000
  AND level = 'error'
ORDER BY time DESC 
LIMIT 100;

// MongoDB
db.logs.find({
  "metadata.user_id": "usr_abc123",
  "metadata.duration_ms": { $gt: 5000 },
  level: "error"
}).sort({ timestamp: -1 }).limit(100);

Database	P50 Latency	P95 Latency	P99 Latency
TimescaleDB	78ms	142ms	223ms
MongoDB	94ms	198ms	341ms

PostgreSQL's GIN indexes on JSONB (inherited by TimescaleDB) are insanely fast. MongoDB's multi-key indexes are good, but the PostgreSQL query planner optimizes JSONB queries aggressively.

Test 3: Aggregation (count errors per service, last 7 days)

-- TimescaleDB
SELECT service, COUNT(*) 
FROM logs 
WHERE time > NOW() - INTERVAL '7 days'
  AND level = 'error'
GROUP BY service 
ORDER BY COUNT(*) DESC;

// MongoDB
db.logs.aggregate([
  { $match: { 
    timestamp: { $gte: new Date(Date.now() - 604800000) },
    level: "error" 
  }},
  { $group: { _id: "$service", count: { $sum: 1 } }},
  { $sort: { count: -1 }}
]);

Database	Query Time
TimescaleDB	1.2s
MongoDB	2.8s

TimescaleDB leverages PostgreSQL's parallel query execution (max_parallel_workers_per_gather=4) + chunk exclusion. Only 7 chunks scanned instead of 100.

3. Storage Efficiency

Test: 100M logs stored, with compression enabled.

Database	Raw Size	Compressed Size	Compression Ratio
TimescaleDB	150 GB	15 GB	10x
MongoDB (WiredTiger)	148 GB	42 GB	3.5x

TimescaleDB compression is the killer feature. It uses columnar compression on hypertable chunks, similar to Parquet files.

When you compress a chunk, TimescaleDB:

Converts row-based storage → column-based
Applies compression algorithms per column (different algos for text, numbers, JSONB)
Stores compressed chunk separately
Query planner transparently decompresses on-the-fly

MongoDB's WiredTiger compression is good (snappy/zlib), but it's row-based. TimescaleDB's columnar approach + TOAST (The Oversized-Attribute Storage Technique) for JSONB fields is brutal.

Our compression policy:

SELECT add_compression_policy('logs', INTERVAL '1 day');

Logs older than 1 day are automatically compressed. Query performance on compressed data? Still fast (50-150ms for most queries) because the query planner only decompresses relevant chunks.

4. Resource Usage

Test: 100M logs, 10 concurrent query clients.

Database	RAM Usage	CPU Usage (avg)
TimescaleDB	8 GB	45%
MongoDB	12 GB	62%

TimescaleDB's shared_buffers (2 GB) + OS page cache made it more memory-efficient. MongoDB's WiredTiger cache was set to 6 GB, but it still used more overall RAM.

Chunk-based architecture helps: Only active chunks (recent logs) stay hot in memory. Old compressed chunks are rarely accessed.

Where MongoDB Wins

I'm not saying TimescaleDB is always better. MongoDB has real advantages:

1. Horizontal Scaling (Sharding)

MongoDB's sharding is production-ready out of the box. TimescaleDB has multi-node (distributed hypertables), but it's enterprise-only and more complex than MongoDB's sharding.

If you need to scale beyond a single node (100M+ logs/day), MongoDB's architecture is designed for this. TimescaleDB single-node can handle a lot (we're doing 500k/day on 8GB RAM), but MongoDB wins for massive scale.

2. Schema Flexibility

MongoDB truly doesn't care about schema. TimescaleDB (PostgreSQL) requires a table definition. If your JSON structure is wildly unpredictable, MongoDB might be easier.

That said, JSONB columns give you schema flexibility where you need it. We define fixed columns (time, level, service) and dump everything else into metadata JSONB.

3. Change Streams

MongoDB's change streams (real-time notifications) are built-in. TimescaleDB uses PostgreSQL's LISTEN/NOTIFY, which works but is more limited.

However, TimescaleDB has continuous aggregates — pre-computed materialized views that refresh automatically. For dashboards, this is often better than change streams.

Where TimescaleDB Wins

1. Time-Series Superpowers

Hypertables, compression, continuous aggregates, retention policies — all automatic.

Example: Delete logs older than 90 days:

SELECT add_retention_policy('logs', INTERVAL '90 days');

TimescaleDB drops entire chunks (instant). MongoDB needs to run DELETE queries (slow, index updates required).

Continuous aggregates:

CREATE MATERIALIZED VIEW logs_hourly_stats
WITH (timescaledb.continuous) AS
SELECT 
  time_bucket('1 hour', time) AS bucket,
  service,
  level,
  COUNT(*) as count
FROM logs
GROUP BY bucket, service, level;

Dashboard queries hit the pre-computed aggregate (10-50x faster).

2. Query Flexibility

SQL is more powerful than MongoDB's query language. Complex joins, window functions, CTEs, lateral joins — TimescaleDB can do it all.

Example: "Show me errors with their previous 5 logs for context":

WITH context_logs AS (
  SELECT *, 
    LAG(message, 1) OVER (PARTITION BY service ORDER BY time) AS prev_1,
    LAG(message, 2) OVER (PARTITION BY service ORDER BY time) AS prev_2
  FROM logs
  WHERE time > NOW() - INTERVAL '1 day'
)
SELECT * FROM context_logs WHERE level = 'error';

Good luck doing window functions in MongoDB.

3. ACID Guarantees

TimescaleDB inherits PostgreSQL's full ACID transactions. MongoDB improved this, but PostgreSQL's transaction handling is rock-solid.

4. Operational Simplicity

TimescaleDB is PostgreSQL. One extension, one database.

We use TimescaleDB for:

User data (relational tables)
Log data (hypertables + JSONB)
Session storage (regular tables)
Job queues (pg_cron, graphile-worker)
Metrics (continuous aggregates)

Everything in one database. One backup strategy. One monitoring system.

5. Cost (Cloud Hosting)

Managed PostgreSQL (RDS, Cloud SQL, Azure Database) + TimescaleDB Cloud is significantly cheaper than MongoDB Atlas at scale.

For 500 GB storage + 8 GB RAM:

AWS RDS PostgreSQL: ~$150/month
MongoDB Atlas M30: ~$500/month

Real-World Production Numbers (Logtide)

Our current deployment (TimescaleDB):

30 million logs stored (growing)
500k logs/day ingestion rate
P95 query latency: 50ms
Storage: 15 GB (compressed from 150 GB, 10x ratio)
Server: 4 vCPU, 8 GB RAM (single node)
Retention: 90 days automatic (drop_chunks policy)
Compression: Daily (1 day old chunks auto-compress)

TimescaleDB handles this effortlessly. We haven't needed to think about sharding, scaling, or performance tuning beyond the initial setup.

The Decision Matrix

Choose MongoDB if:

You need horizontal sharding NOW (multi-region, 100M+ docs/day)
Schema is truly unpredictable (no structure at all)
You're already a MongoDB shop with expertise

Choose TimescaleDB if:

You're storing time-series data (logs, metrics, events)
You want operational simplicity (PostgreSQL ecosystem)
You need complex queries (SQL > MongoDB query language)
Storage cost matters (10x compression vs 3.5x)
You value data consistency (ACID transactions)
You want automatic retention/compression (set-and-forget policies)

Conclusion

TimescaleDB (PostgreSQL + time-series superpowers) beats MongoDB for JSON logs.

We chose it for Logtide because:

Time-series native (hypertables, compression, retention)
Query power (SQL + JSONB operators)
Operational simplicity (one database, PostgreSQL ecosystem)
Storage efficiency (10x compression ratio)
Cost (managed hosting is cheaper)

MongoDB is great for general-purpose document storage, but TimescaleDB is purpose-built for time-series JSON.

If your data has timestamps and you're querying by time ranges, TimescaleDB is the obvious choice.

Try it yourself:

# Logtide (TimescaleDB + JSONB)
git clone https://github.com/logtide-dev/logtide
cd logtide/docker
docker compose up -d

TimescaleDB is automatically configured in the Docker setup. Check docker-compose.yml for the PostgreSQL + TimescaleDB image configuration.

Full benchmarks scripts: github.com/logtide-dev/logtide/tree/main/benchmarks

What's your experience with TimescaleDB vs MongoDB? I'd love to hear your benchmarks in the comments.

DEV Community