DEV Community

Reuben
Reuben

Posted on • Edited on

Monitoring ClickHouse Inserts with Datadog Agent

Inserts in ClickHouse are highly sensitive and can easily trigger the notorious “too-many-parts” error. In this post, we discuss how to leverage the Datadog agent together with the Datadog-ClickHouse integration to improve monitoring of insert operations.

Why

The existing integration provides many metrics, but they lack table-level granularity. For example, you cannot filter metrics by table because there is no table tag.

clickhouse.query.insert.count{*} by {table} # This won't work
Enter fullscreen mode Exit fullscreen mode

List of available tags

Fortunately, the integration supports custom SQL queries, which Datadog can ingest as additional metrics. Since these are emitted through the Datadog-ClickHouse integration, they are treated as integration metrics and won't be charged as custom metrics.

We can leverage this feature to create custom SQL queries, emit them as extra metrics, and use them for visualization and alerting in Datadog.

SQL Queries

We can use these queries to monitor insert performance.

Initial Part Size

Tracks the creation of new parts after inserts.

SELECT
  query_id,
  part_type,
  rows,
  size_in_bytes,
  bytes_uncompressed
FROM system.part_log
WHERE
  database = 'my-db-name'
  AND table = 'my-table-name'
  AND event_type = 'NewPart'
  AND event_time > (now() - toIntervalSecond(60))
Enter fullscreen mode Exit fullscreen mode

Example output

| query_id        | part_type | rows      | size_in_bytes  | bytes_uncompressed |
|-----------------|-----------|-----------|----------------|--------------------|
| 20260214_001    | Wide      | 1000000   | 524288000      | 1048576000         |
| 20260214_002    | Wide      | 2500000   | 1310720000     | 2621440000         |
| 20260214_003    | Compact   | 500000    | 104857600      | 209715200          |
| 20260214_004    | Wide      | 3000000   | 1572864000     | 3145728000         |
Enter fullscreen mode Exit fullscreen mode
  • rows (Rows per insert)

    Measures how many rows were inserted in each part.

    Helps evaluate batch efficiency: very small batches → more overhead, slower throughput.

  • size_in_bytes (Compressed part size)

    ClickHouse splits data into parts.

    Large parts improve compression, but too many parts can lead to the notorious "too-many-parts" error if inserts are frequent.

  • bytes_uncompressed (Uncompressed part size)

    Used to calculate compression ratio (bytes_uncompressed / size_in_bytes).

    Helps monitor storage efficiency and disk I/O impact, especially for large inserts.

Insert queries

Tracks insert execution performance for completed queries.

SELECT
  query_id,
  arrayJoin(tables) AS table,
  query_duration_ms,
  read_rows,
  read_bytes,
  written_rows,
  written_bytes,
  memory_usage
FROM
  system.query_log
WHERE
  type = 'QueryFinish'
  AND query_kind = 'Insert'
  AND has(tables, 'my-table-name')
  AND has(databases, 'my-db-name')
  AND event_time > (now() - toIntervalSecond(60))
Enter fullscreen mode Exit fullscreen mode

Example output

| query_id          | table          | query_duration_ms | read_rows | read_bytes | written_rows | written_bytes | memory_usage |
|-------------------|----------------|-------------------|-----------|------------|--------------|---------------|--------------|
| 20260214_101523_1 | my-table-name  | 120               | 0         | 0          | 500000       | 157286400     | 10485760     |
| 20260214_101524_2 | my-table-name  | 115               | 0         | 0          | 600000       | 188743680     | 12582912     |
| 20260214_101525_3 | my-table-name  | 130               | 0         | 0          | 450000       | 141557760     | 9437184      |
Enter fullscreen mode Exit fullscreen mode
  • query_duration_ms

    Time taken for each insert query.

    Essential to detect slow inserts, which can indicate bottlenecks.

  • written_rows / written_bytes

    Actual data inserted per query.

    Use this with query_duration_ms to calculate insert throughput (rows/sec or bytes/sec).

  • read_rows / read_bytes

    Shows any reads caused by the insert (e.g., merges).

    Important for understanding I/O impact on the system during large inserts.

  • memory_usage

    Memory consumed by the insert query.

    Helps detect inserts that could cause OOM or memory spikes, especially for huge batches.

Datadog Config

This configuration enables the Datadog Agent to monitor ClickHouse inserts by querying system tables and emitting metrics.

It tracks new parts (rows, size_in_bytes, bytes_uncompressed) to monitor batching and compression, and insert query performance (query_duration_ms, read_rows, written_rows, memory_usage) to monitor throughput and resource usage.

Metrics are collected at regular intervals, allowing real-time monitoring and alerts.

The custom_queries section is where you define the SQL queries that emit metrics for both initial parts and insert query performance. All the metrics discussed later are generated based on these custom queries.

Config File: /etc/datadog-agent/conf.d/clickhouse.d/conf.yaml


init_config:
instances:
  - server: localhost
    username: datadog
    password: password
    custom_queries:
        # Query to monitor initial parts
        - query: |
            SELECT
              query_id,
              part_type,
              rows,
              size_in_bytes,
              bytes_uncompressed
            FROM system.part_log
            WHERE
              database = 'my-db-name'
              AND table = 'my-table-name'
              AND event_type = 'NewPart'
              AND event_time > (now() - toIntervalSecond(60))
          columns:
            - name: query_id
              type: tag
            - name: part_type
              type: tag
            - name: rows
              type: gauge
            - name: size_in_bytes
              type: gauge
            - name: bytes_uncompressed
              type: gauge
          collection_interval: 60
          # Hardcoded way to inject tags
          tags:
            - "table:my-table-name"
          metric_prefix: clickhouse.my_custom_queries.insert.initial_parts
        # Query to monitor insert queries
        - query: |
            SELECT
              query_id,
              arrayJoin(tables) AS table, -- expand array to one row per table
              query_duration_ms,
              read_rows,
              read_bytes,
              written_rows,
              written_bytes,
              memory_usage
            FROM
              system.query_log
            WHERE
              type = 'QueryFinish'
              AND query_kind = 'Insert'
              AND has(tables, 'my-table-name')
              AND has(databases, 'my-db-name')
              AND event_time > (now() - toIntervalSecond(60))
          columns:
            - name: query_id
              type: tag
            # We inject the table name as a tag using the result from the SQL query
            - name: table
              type: tag
            - name: query_duration_ms
              type: gauge
            - name: read_rows
              type: gauge
            - name: read_bytes
              type: gauge
            - name: written_rows
              type: gauge
            - name: written_bytes
              type: gauge
            - name: memory_usage
              type: gauge
          collection_interval: 60
          metric_prefix: clickhouse.my_custom_queries.insert.queries
Enter fullscreen mode Exit fullscreen mode

Metrics Generated

These are the metrics generated by the Datadog Agent based on the custom_queries configuration.

From the initial parts query

We get 3 metrics:

  • clickhouse.my_custom_queries.insert.initial_parts.rows
  • clickhouse.my_custom_queries.insert.initial_parts.size_in_bytes
  • clickhouse.my_custom_queries.insert.initial_parts. bytes_uncompressed

Each of those metric will have the tags, query_id, part_type & table.

Tags from the new metric produced

From the insert queries query

We get 6 metrics:

  • clickhouse.my_custom_queries.insert.queries.query_duration_ms
  • clickhouse.my_custom_queries.insert.queries.read_rows
  • clickhouse.my_custom_queries.insert.queries.read_bytes
  • clickhouse.my_custom_queries.insert.queries.written_rows
  • clickhouse.my_custom_queries.insert.queries.written_bytes
  • clickhouse.my_custom_queries.insert.queries.memory_usage

Each of those metric will have the tags, query_id & table.

Example tags from the new metrics

These metrics help monitor insert performance, including query duration, resource usage, and data throughput.

Using the metrics in Datadog

Once the metrics are emitted, they can be visualized and monitored for insert performance.

Number of insert queries

Track how many insert queries are executed per table over time using

count_nonzero(sum:clickhouse.my_custom_queries.insert.queries.written_rows{table:default.nyc_taxi, db:default} by {query_id})
Enter fullscreen mode Exit fullscreen mode

Number of insert queries

Use the count_nonzero function to visualize how many queries

Number of rows in initial parts

Monitor how many rows in the initial parts

max:clickhouse.my_custom_queries.insert.initial_parts.rows{table:nyc_taxi} by {query_id}
Enter fullscreen mode Exit fullscreen mode

Number of rows in each initial part

Monitoring compression ratio

Compression ratio helps evaluate storage efficiency for inserts.

Metric A

max:clickhouse.my_custom_queries.insert.initial_parts.bytes_uncompressed{table:nyc_taxi} by {query_id}
Enter fullscreen mode Exit fullscreen mode

Metric B

max:clickhouse.my_custom_queries.insert.initial_parts.size_in_bytes{table:nyc_taxi} by {query_id}
Enter fullscreen mode Exit fullscreen mode

Formula

Compression Ratio: A / B
Enter fullscreen mode Exit fullscreen mode

Visualization of uncompressed, compressed & compression ratio

Conclusion

Using the Datadog Agent with custom SQL queries, you can gain table-level visibility into ClickHouse inserts, monitor batching efficiency, part sizes, compression ratios, and query performance in real time.

This approach is also easily extensible. You can define additional custom queries to monitor specific SQL statements.

For example, if your application executes an API like:

GET /userActionsCount?user-id=10
Enter fullscreen mode Exit fullscreen mode

which translates to a ClickHouse query such as

SELECT user_id, COUNT(*) AS total_actions
FROM user_activity
WHERE user_id = 10
  AND event_date >= today() - 7
GROUP BY user_id
Enter fullscreen mode Exit fullscreen mode

You can track its performance by querying the system.query_log table for that specific query pattern and sending metrics to Datadog

- query: |
    SELECT
      query_id,
      query_duration_ms,
      read_rows,
      read_bytes,
      memory_usage
    FROM system.query_log
    WHERE
      type = 'QueryFinish'
      AND query LIKE 'SELECT user_id, COUNT(*) AS total_actions FROM user_activity WHERE user_id = % AND event_date >= today() - 7 GROUP BY user_id'
      AND event_time > (now() - toIntervalSecond(60))
  columns:
    - name: query_id
      type: tag
    - name: query_duration_ms
      type: gauge
    - name: read_rows
      type: gauge
    - name: read_bytes
      type: gauge
    - name: memory_usage
      type: gauge
  collection_interval: 60
  tags:
    - api:user_actions_count  # all metrics will have this tag
  metric_prefix: clickhouse.my_custom_queries.application_queries
Enter fullscreen mode Exit fullscreen mode

This lets you monitor the performance of API-generated queries, detect bottlenecks, and set up alerts, giving full observability over both inserts and application-level queries.

Top comments (0)