Skip to Main Content

Second-level Partition Migration

Info

The term "AutoMQ Kafka" mentioned in this text specifically refers to the open-source project called automq-for-kafka, which is hosted under the GitHub organization AutoMQ, operated by AutoMQ CO.,LTD.

Introduction

The current partition migration of Apache Kafka® relies on the completion of a large amount of data synchronization. Taking a Kafka partition with 100MB/s traffic as an example, the amount of data generated after running for one day is about 8.2T. If you need to migrate the partition to another Broker at this time, you need to Copying the entire amount of data, even for nodes with 1 Gbps bandwidth, takes hours to complete the migration, which makes the Apache Kafka® cluster almost incapable of real-time elasticity. Thanks to the storage and calculation separation architecture of AutoMQ Kafka , only a small amount of data needs to be synchronized during actual partition migration, which makes it possible to shorten the partition migration time to seconds.

How AutoMQ Kafka Implements Partition Migration in Seconds

As shown in the figure above, take the migration of partition P1 from Broker-0 to Broker-1 as an example:

  1. When Kraft Controller receives the partition migration command, it will construct the corresponding PartitionChangeRecord and commit it to the Kraft Log layer, delete Broker-0 from the leader replica list, and add Broker-1 to the follower replica list. Broker-0 synchronizes Kraft Log to monitor the P1 partition changes and enters the partition shutdown process.
  2. When the partition is closed, if there is still data in P1 that has not been uploaded to the Object storage, forced upload will be triggered. In a stable running cluster, this part of data often does not exceed hundreds of MB. Combined with the sudden change provided by the current Cloud providers, To increase network bandwidth capabilities, this process generally only takes seconds to complete. When the data upload of P1 is completed, partition P1 can be safely closed and deleted from Broker-0.
  3. After P1 completes the shutdown of the slave Broker, it will actively trigger a leader election. At this time, Broker-1, as the only replica, is promoted to the leader of P1 and enters the partition recovery process.
  4. During partition recovery, the metadata corresponding to P1 will be pulled from the Object storage to restore the corresponding checkpoint of P1, and then the corresponding data recovery will be performed based on the shutdown status of P1 (whether it is clean shutdown).
  5. The partition migration is now complete.