Skip to Main Content

AutoMQ vs. Apache Kafka

The term "AutoMQ Kafka" mentioned in this article specifically refers to the source available project automq-for-kafka under the GitHub AutoMQ organization by AutoMQ CO., LTD.

AutoMQ for Kafka (also known as AutoMQ Kafka) is a distribution of Apache Kafka. By minimally modifying the storage layer of the Apache Kafka code, AutoMQ Kafka unloads the broker data to shared storage, transforming Kafka into a truly cloud-native application, while ensuring 100% compatibility with Kafka protocols and semantics.

The changes made by AutoMQ Kafka to the code are minimal. AutoMQ Kafka can continue to merge and follow the code of Apache Kafka, maintaining the same features and bug fixes as Apache Kafka. For public feature enhancements and fixes, AutoMQ Kafka will contribute to Apache Kafka and then merge back into AutoMQ Kafka to ensure the co-development of both Apache Kafka and AutoMQ Kafka.

The following is a comparison of the feature differences between AutoMQ Kafka after cloud-based architectural reshaping and Apache Kafka:

Features/ProductsAutoMQ for KafkaApache Kafka
< 3.6.0
Apache Kafka Tiered Storage
>= 3.6.0
100% Kafka Protocol & Semantic Compatibility
Scaling in Minutes
  • 1min detection + 1min scaling + 1min self-balancing
  • Automatic scaling and continuous self-balancing included

🚫
  • Hours to days: Self-balancing time is directly proportional to the amount of storage

🚫
  • About 1 hour: Although most of the storage has been transferred to S3, the local area will still retain the data of the last Segment, and the transfer time is proportional to the number of Partitions
Partition Reassignment in Seconds
  • Seconds: All data is unloaded to object storage
  • No business impact: Partition reassignment does not require data replication

🚫
  • Hours: The time required for reassignment is proportional to the amount of Partition data
  • Business impact: Replication traffic will impact normal business

🚫
  • Minutes: Reassignment only requires copying data not uploaded to S3, and single Partition reassignment time is usually in minutes
Pay-as-you-go Storage
  • All storage is unloaded to object storage, and payment is made according to the usage of object storage

🚫
  • Compute and storage are bound together, and the capacity prepared for computing peaks will waste disk space
  • Because the self-balancing operation is heavy, more disks are usually reserved

  • Most of the cold data storage can be paid on demand
  • Local disk hot data storage still has compute and storage binding problems
"Stateless"
  • Can be gracefully scaled down in 1 minute
  • Spot instances can be used

🚫
  • Human intervention is required to transfer replicas, and can only be scaled down after all the data of the local replicas are replenished on other nodes

🚫
  • Similar to without tiered storage, only the amount of data to be copied is at the GB level
High Availability with Only Single Replica
  • Partition can be reassign between Broker nodes in seconds without data replication

🚫
  • Partition reassignment requires data replication

🚫
  • Consistent with no tiered storage
Zookeeper Support
🚫
  • Only supports KRAFT mode