Advanced Apache Kafka: Mastering the Architecture for 2026

#kafka #architecture #ai

Apache Kafka has evolved far beyond a simple pub/sub messaging system. For modern data engineers and architects, "knowing Kafka" now means understanding the massive architectural shifts that have occurred in the last few years.

From the removal of ZooKeeper to the separation of compute and storage, the platform has matured into a true cloud-native streaming database. This post dives into five advanced topics that distinguish a standard Kafka implementation from a high-performance, enterprise-grade architecture.

The KRaft Revolution: Kafka Without ZooKeeper The dependency on ZooKeeper has long been a bottleneck for Kafka metadata management. KRaft (Kafka Raft) mode removes this dependency entirely, embedding a Raft-based controller quorum directly into the Kafka nodes.

Why It Matters
Scalability: In the ZooKeeper era, cluster metadata was limited. KRaft allows for millions of partitions per cluster because metadata is stored in a topic (__cluster_metadata) rather than an external system, allowing for snapshotting and faster loading.

Simpler Ops: You no longer need to manage two distinct distributed systems. A single process handles both data plane and control plane duties (though in production, roles are often separated).

server.properties for a combined node

process.roles=broker,controller
node.id=1
controller.quorum.voters=1@localhost:9093

Tiered Storage: Decoupling Compute from Storage Historically, Kafka’s retention was limited by the physical disk space on your brokers. If you wanted to store months of data, you had to add more brokers (compute) just to get more disk (storage). This "coupled" architecture is expensive.

Tiered Storage breaks this link by offloading old log segments to cheap object storage (like AWS S3 or GCS) while keeping the "hot" tail of the log on fast local NVMe SSDs.

How It Works
Hot Tier: Recent data is written to the broker’s local disk.

Cold Tier: As segments roll, a background thread copies them to the remote object store.

Transparent Reads: Consumers are unaware of the tiering. If they request an old offset, the broker fetches the slice from S3 seamlessly.

Enable remote storage on the broker

remote.log.storage.system.enable=true

Configure the retention for local disk vs. total retention

Keep 24 hours on fast SSD, 30 days in S3

log.retention.ms=2592000000 # 30 days
remote.log.storage.manager.impl.prefix=rsm.config.
remote.log.metadata.manager.impl.prefix=rlmm.config.

Exactly-Once Semantics (EOS): The Holy Grail "At-least-once" delivery is the default, but it forces downstream applications to handle deduplication. Kafka's Exactly-Once Semantics (EOS) ensures that records are processed exactly one time, even in the event of broker failures or producer retries.

This is achieved through two mechanisms working in tandem:

Idempotent Producers: Guarantees that retries don't create duplicates in the log using sequence numbers.

Transactional API: Allows writing to multiple topics/partitions atomically.

The Transaction Flow
The producer initiates a transaction with a unique transactional.id.

Writes are sent to the log but marked as "uncommitted."

The Transaction Coordinator (a specialized broker thread) manages the two-phase commit protocol.

Consumers must be configured with isolation.level=read_committed to ignore aborted or open transactions.

// Producer Setup
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "my-order-processor");
Producer producer = new KafkaProducer<>(props);

producer.initTransactions();

try {
producer.beginTransaction();
// process data and send records
producer.send(record);
// commit offsets for the consumer part of the loop
producer.sendOffsetsToTransaction(offsets, group);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
}

Cluster Linking vs. MirrorMaker 2 Multi-region disaster recovery (DR) is a standard requirement. The traditional tool, MirrorMaker 2 (MM2), is essentially a Kafka Connect cluster that pulls from Source and pushes to Target. It works, but it's operationally heavy and introduces "offset translation" issues (offsets in Source ≠ offsets in Target).

Cluster Linking (available in Confluent Server and increasingly via KIPs in open source) offers a superior architecture.

Tuning RocksDB for Kafka Streams If you use Kafka Streams (or ksqlDB), your state is likely stored in RocksDB, an embedded key-value store. By default, RocksDB is optimized for spinning disks, not the containerized SSD environments most Kafka apps run in.

The Memory Problem
A common issue is the application crashing with OOM (Out Of Memory) because RocksDB’s off-heap memory usage is unconstrained.

Essential Tuning Parameters
To master stateful performance, you must tune the RocksDBConfigSetter:

Block Cache: Limit the memory used for reading uncompressed blocks.

Write Buffer (MemTable): Controls how much data is held in RAM before flushing to disk.

Compaction Style: Switch to Level compaction for read-heavy workloads or Universal for write-heavy ones.

public static class CustomRocksDBConfig implements RocksDBConfigSetter {
@override
public void setConfig(final String storeName, final Options options, final Map configs) {
// Strict capacity limit for block cache to prevent OOM
BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
tableConfig.setBlockCacheSize(100 * 1024 * 1024); // 100MB
options.setTableFormatConfig(tableConfig);

    // Increase parallelism for flushes and compactions
    options.setMaxBackgroundJobs(4);
}

}

Conclusion: The New Standard for Streaming Data Apache Kafka has crossed the chasm from being a simple, high-throughput "pipe" to becoming the central nervous system of modern digital architecture. The features discussed here—KRaft, Tiered Storage, Exactly-Once Semantics, Cluster Linking, and RocksDB tuning—are not just incremental updates; they represent a fundamental shift in how we build data platforms.

By adopting these advanced patterns, you move your engineering team from "maintenance mode"—constantly fighting ZooKeeper flakes or disk capacity issues—to "innovation mode," where the focus is entirely on building resilient, real-time applications.