Continuous Self-Balancing
In an online Apache Kafka® cluster, fluctuations in traffic, Topic creation and deletion, and Broker failures and restarts are constantly occurring. These changes can lead to an uneven distribution of traffic across cluster nodes, resulting in resource wastage and impacting business stability. It is essential to proactively reassign different partitions of a Topic across nodes to balance traffic and data.
Challenges Faced by Open Source Solutions
Apache Kafka has always faced significant challenges in achieving self-balancing of data. The community has two solutions:
The official Apache Kafka partition reassignment tool requires operations personnel to devise specific reassignment plans. For Kafka clusters with hundreds or thousands of nodes, it is nearly impossible to manually monitor the cluster state and create a comprehensive partition reassignment plan.
The community also offers third-party external plugins such as Cruise Control[1] to assist in generating reassignment plans. However, due to the extensive variables involved in Apache Kafka's self-balancing process (replica distribution, leader traffic distribution, node resource utilization, etc.) and the resource contention and hours-to-days-long duration caused by data synchronization, existing solutions are complex and have low decision timeliness. Implementing data self-balancing strategies still relies on operations personnel's review and continuous monitoring, failing to truly address the challenges posed by Apache Kafka data self-balancing.
AutoMQ's Architectural Advantages
Thanks to AutoMQ's deep integration with cloud-native capabilities, we have reimplemented Apache Kafka's underlying storage entirely based on cloud object storage, upgrading from a Shared Nothing architecture to a Shared Storage architecture. This enables second-level partition reassignment capabilities, greatly simplifying the factors involved in reassignment planning:
There is no need to consider node disk resources.
No need to consider the leader distribution and replica distribution of partitions.
The reassignment of partitions does not involve data synchronization and copying.
Therefore, we have the opportunity to implement a built-in, lightweight automatic data balancing component within AutoMQ, which continuously monitors cluster status and automatically performs partition reassignment.
AutoMQ Data Self-Balancing Implementation
The figure above is the architecture diagram of AutoMQ's built-in Auto Balancing component. This architecture collects cluster metrics, automatically generates partition reassignment plans, and continuously balances the cluster's traffic.
AutoMQ implements Apache Kafka's MetricsReporter interface, monitoring all built-in metrics information of Apache Kafka. It periodically samples metrics of interest (such as network ingress and egress traffic, CPU utilization, etc.) and pre-aggregates these metrics on the Broker side. The aggregated metrics are serialized into a Kafka message and sent to a designated internal Topic.
The AutoMQ Controller maintains an in-memory cluster state model to describe the current state of the cluster, including Broker status, Broker resource capacity, and the traffic information of Topic-Partition managed by each Broker. By listening to KRaft Log event information, the cluster state model can promptly sense changes in the status of Broker and Topic-Partition and update the model accordingly to remain consistent with the actual cluster state.
Meanwhile, the metrics collector in the AutoMQ Controller consumes messages from the internal Topic in real-time, deserializes the messages into specific metrics, and updates them into the cluster state model, thereby constructing all the prerequisite information needed for data self-balancing.
AutoMQ Controller's scheduler periodically captures snapshots of the cluster state model and identifies overutilized or underutilized brokers based on predefined "goals." It then attempts to reassign or exchange partitions to achieve traffic self-balancing.
AutoMQ Data Self-Balancing Example
Consider a three-node AutoMQ cluster. Initially, each of the three nodes handles approximately 40MiB/s of write traffic. In the second phase, the traffic for Broker-0 is increased to about 80MiB/s, Broker-1 to about 120MiB/s, and Broker-2 remains at 40MiB/s. As shown in the figure above, automatic load balancing is triggered in the second phase, and the traffic across the three brokers gradually converges to a balanced range.
NOTE: In this scenario, to facilitate observation, the partition reassignment cooldown time is manually extended. With the default configuration, the traffic balancing time is approximately 1 minute.
Quick Experience
Refer to Example: Continuous Data Self-Balancing▸ to experience the continuous data self-balancing capabilities of AutoMQ.
References
[1]. LinkedIn's open-source Cruise Control tool: https://github.com/linkedin/cruise-control