Inserts in ClickHouse are highly sensitive and can easily trigger the notorious “too-many-parts” error. In this post, we discuss how to leverage the Datadog agent together with the Datadog-ClickHouse integration to improve monitoring of insert operations.
Why
The existing integration provides many metrics, but they lack table-level granularity. For example, you cannot filter metrics by table because there is no table tag.
clickhouse.query.insert.count{*} by {table} # This won't work
Fortunately, the integration supports custom SQL queries, which Datadog can ingest as additional metrics. Since these are emitted through the Datadog-ClickHouse integration, they are treated as integration metrics and won't be charged as custom metrics.
We can leverage this feature to create custom SQL queries, emit them as extra metrics, and use them for visualization and alerting in Datadog.
SQL Queries
We can use these queries to monitor insert performance.
Initial Part Size
Tracks the creation of new parts after inserts.
SELECT
query_id,
part_type,
rows,
size_in_bytes,
bytes_uncompressed
FROM system.part_log
WHERE
database = 'my-db-name'
AND table = 'my-table-name'
AND event_type = 'NewPart'
AND event_time > (now() - toIntervalSecond(60))
Example output
| query_id | part_type | rows | size_in_bytes | bytes_uncompressed |
|-----------------|-----------|-----------|----------------|--------------------|
| 20260214_001 | Wide | 1000000 | 524288000 | 1048576000 |
| 20260214_002 | Wide | 2500000 | 1310720000 | 2621440000 |
| 20260214_003 | Compact | 500000 | 104857600 | 209715200 |
| 20260214_004 | Wide | 3000000 | 1572864000 | 3145728000 |
-
rows (Rows per insert)
Measures how many rows were inserted in each part.
Helps evaluate batch efficiency: very small batches → more overhead, slower throughput.
-
size_in_bytes (Compressed part size)
ClickHouse splits data into parts.
Large parts improve compression, but too many parts can lead to the notorious "too-many-parts" error if inserts are frequent.
-
bytes_uncompressed (Uncompressed part size)
Used to calculate compression ratio (bytes_uncompressed / size_in_bytes).
Helps monitor storage efficiency and disk I/O impact, especially for large inserts.
Insert queries
Tracks insert execution performance for completed queries.
SELECT
query_id,
arrayJoin(tables) AS table,
query_duration_ms,
read_rows,
read_bytes,
written_rows,
written_bytes,
memory_usage
FROM
system.query_log
WHERE
type = 'QueryFinish'
AND query_kind = 'Insert'
AND has(tables, 'my-table-name')
AND has(databases, 'my-db-name')
AND event_time > (now() - toIntervalSecond(60))
Example output
| query_id | table | query_duration_ms | read_rows | read_bytes | written_rows | written_bytes | memory_usage |
|-------------------|----------------|-------------------|-----------|------------|--------------|---------------|--------------|
| 20260214_101523_1 | my-table-name | 120 | 0 | 0 | 500000 | 157286400 | 10485760 |
| 20260214_101524_2 | my-table-name | 115 | 0 | 0 | 600000 | 188743680 | 12582912 |
| 20260214_101525_3 | my-table-name | 130 | 0 | 0 | 450000 | 141557760 | 9437184 |
-
query_duration_ms
Time taken for each insert query.
Essential to detect slow inserts, which can indicate bottlenecks.
-
written_rows / written_bytes
Actual data inserted per query.
Use this with query_duration_ms to calculate insert throughput (rows/sec or bytes/sec).
-
read_rows / read_bytes
Shows any reads caused by the insert (e.g., merges).
Important for understanding I/O impact on the system during large inserts.
-
memory_usage
Memory consumed by the insert query.
Helps detect inserts that could cause OOM or memory spikes, especially for huge batches.
Datadog Config
This configuration enables the Datadog Agent to monitor ClickHouse inserts by querying system tables and emitting metrics.
It tracks new parts (rows, size_in_bytes, bytes_uncompressed) to monitor batching and compression, and insert query performance (query_duration_ms, read_rows, written_rows, memory_usage) to monitor throughput and resource usage.
Metrics are collected at regular intervals, allowing real-time monitoring and alerts.
The custom_queries section is where you define the SQL queries that emit metrics for both initial parts and insert query performance. All the metrics discussed later are generated based on these custom queries.
Config File: /etc/datadog-agent/conf.d/clickhouse.d/conf.yaml
init_config:
instances:
- server: localhost
username: datadog
password: password
custom_queries:
# Query to monitor initial parts
- query: |
SELECT
query_id,
part_type,
rows,
size_in_bytes,
bytes_uncompressed
FROM system.part_log
WHERE
database = 'my-db-name'
AND table = 'my-table-name'
AND event_type = 'NewPart'
AND event_time > (now() - toIntervalSecond(60))
columns:
- name: query_id
type: tag
- name: part_type
type: tag
- name: rows
type: gauge
- name: size_in_bytes
type: gauge
- name: bytes_uncompressed
type: gauge
collection_interval: 60
# Hardcoded way to inject tags
tags:
- "table:my-table-name"
metric_prefix: clickhouse.my_custom_queries.insert.initial_parts
# Query to monitor insert queries
- query: |
SELECT
query_id,
arrayJoin(tables) AS table, -- expand array to one row per table
query_duration_ms,
read_rows,
read_bytes,
written_rows,
written_bytes,
memory_usage
FROM
system.query_log
WHERE
type = 'QueryFinish'
AND query_kind = 'Insert'
AND has(tables, 'my-table-name')
AND has(databases, 'my-db-name')
AND event_time > (now() - toIntervalSecond(60))
columns:
- name: query_id
type: tag
# We inject the table name as a tag using the result from the SQL query
- name: table
type: tag
- name: query_duration_ms
type: gauge
- name: read_rows
type: gauge
- name: read_bytes
type: gauge
- name: written_rows
type: gauge
- name: written_bytes
type: gauge
- name: memory_usage
type: gauge
collection_interval: 60
metric_prefix: clickhouse.my_custom_queries.insert.queries
Metrics Generated
These are the metrics generated by the Datadog Agent based on the custom_queries configuration.
From the initial parts query
We get 3 metrics:
clickhouse.my_custom_queries.insert.initial_parts.rowsclickhouse.my_custom_queries.insert.initial_parts.size_in_bytesclickhouse.my_custom_queries.insert.initial_parts. bytes_uncompressed
Each of those metric will have the tags, query_id, part_type & table.
From the insert queries query
We get 6 metrics:
clickhouse.my_custom_queries.insert.queries.query_duration_msclickhouse.my_custom_queries.insert.queries.read_rowsclickhouse.my_custom_queries.insert.queries.read_bytesclickhouse.my_custom_queries.insert.queries.written_rowsclickhouse.my_custom_queries.insert.queries.written_bytesclickhouse.my_custom_queries.insert.queries.memory_usage
Each of those metric will have the tags, query_id & table.
These metrics help monitor insert performance, including query duration, resource usage, and data throughput.
Using the metrics in Datadog
Once the metrics are emitted, they can be visualized and monitored for insert performance.
Number of insert queries
Track how many insert queries are executed per table over time using
count_nonzero(sum:clickhouse.my_custom_queries.insert.queries.written_rows{table:default.nyc_taxi, db:default} by {query_id})
Use the
count_nonzerofunction to visualize how many queries
Number of rows in initial parts
Monitor how many rows in the initial parts
max:clickhouse.my_custom_queries.insert.initial_parts.rows{table:nyc_taxi} by {query_id}
Monitoring compression ratio
Compression ratio helps evaluate storage efficiency for inserts.
Metric A
max:clickhouse.my_custom_queries.insert.initial_parts.bytes_uncompressed{table:nyc_taxi} by {query_id}
Metric B
max:clickhouse.my_custom_queries.insert.initial_parts.size_in_bytes{table:nyc_taxi} by {query_id}
Formula
Compression Ratio: A / B
Conclusion
Using the Datadog Agent with custom SQL queries, you can gain table-level visibility into ClickHouse inserts, monitor batching efficiency, part sizes, compression ratios, and query performance in real time.
This approach is also easily extensible. You can define additional custom queries to monitor specific SQL statements.
For example, if your application executes an API like:
GET /userActionsCount?user-id=10
which translates to a ClickHouse query such as
SELECT user_id, COUNT(*) AS total_actions
FROM user_activity
WHERE user_id = 10
AND event_date >= today() - 7
GROUP BY user_id
You can track its performance by querying the system.query_log table for that specific query pattern and sending metrics to Datadog
- query: |
SELECT
query_id,
query_duration_ms,
read_rows,
read_bytes,
memory_usage
FROM system.query_log
WHERE
type = 'QueryFinish'
AND query LIKE 'SELECT user_id, COUNT(*) AS total_actions FROM user_activity WHERE user_id = % AND event_date >= today() - 7 GROUP BY user_id'
AND event_time > (now() - toIntervalSecond(60))
columns:
- name: query_id
type: tag
- name: query_duration_ms
type: gauge
- name: read_rows
type: gauge
- name: read_bytes
type: gauge
- name: memory_usage
type: gauge
collection_interval: 60
tags:
- api:user_actions_count # all metrics will have this tag
metric_prefix: clickhouse.my_custom_queries.application_queries
This lets you monitor the performance of API-generated queries, detect bottlenecks, and set up alerts, giving full observability over both inserts and application-level queries.






Top comments (0)