Skip to Main Content

Difference with Apache Kafka

AutoMQ represents a new generation of Apache Kafka® distribution, redesigned with cloud-native principles in mind, offering up to tenfold improvements in cost efficiency and scalability while maintaining full compatibility with the Apache Kafka protocol. This article primarily explores the principal distinctions and connections between AutoMQ and Apache Kafka.

Differences from Apache Kafka

Architecture: Shared Nothing vs. Shared Storage

Apache Kafka utilizes local disk storage and creates a highly dependable storage system through software-layer replication logic (ISR mechanism), providing a sort of "infinite" streaming storage abstraction to the application layer. All Kafka data is stored on local disks in a manner often described as the Shared Nothing architecture.

Conversely, AutoMQ employs a separation of compute and storage, moving away from local disks and instead using shared object storage services for data storage. AutoMQ has developed an S3Stream storage repository (software library) to replace local log storage in Apache Kafka. By doing so, it ensures that the upper-layer Apache Kafka functional semantics remain intact while transparently leveraging object storage to store Kafka data, an approach known as the Shared Storage architecture.

A comparison of the differing architectures of Apache Kafka and AutoMQ is outlined below:

Apache Kafka
Adopts Shared Nothing architecture
AutoMQ
Adopts Shared Storage architecture
Data is stored on local disks, requiring implementation of multi-replica replication across nodes
Data is stored in S3 shared storage (highly reliable with three replicas), eliminating the need for multi-replica replication
Data is isolated across nodes, with data access bound to specific nodesData is shared across nodes, allowing cross-node access
Adding nodes for horizontal scaling or replacing failed nodes requires reassignment of shard dataAdding or replacing nodes does not require data reassignment for seamless transition

Note:

Apache Kafka® is set to introduce tiered storage features starting with version 3.6 (not yet production-ready), which will facilitate the offloading of historical data to object storage services. This setup has both similarities and differences with AutoMQ, which solely depends on object storage for its data handling layer. For a detailed comparison, see Difference with Tiered Storage▸.

Elasticity: Reassigning Partitions in Seconds Versus Hours

Reassigning partitions is a common and inevitable task in Kafka’s production settings, particularly when dealing with localized node failures, scaling the cluster, or managing localized hotspots.

Apache Kafka® employs a Shared Nothing architecture, where each partition's data is exclusively stored on a designated storage node. If reassignment of a partition is necessary, it involves transferring the entire data set to a new target node, a process that is often lengthy and fraught with unpredictability.

Example:

Taking a Kafka partition with a write throughput of 100MiB/s as an example, it generates about 8.2TiB of data in one day. If reassignment of this partition is required, it involves transferring all data to other nodes. Even with a network bandwidth of 1Gbps, it takes hours to complete the reassignment.

AutoMQ employs a compute-storage separation architecture, where the full data of each partition is stored in S3 object storage. During partition reassignment, only a small amount of metadata needs to be synchronized to complete the switch. For partitions of any write throughput scale, AutoMQ ensures the switch is completed in seconds.

AutoMQ supports second-level partition reassignment, which offers AutoMQ significantly faster and more reliable flexibility in scenarios like cluster scaling and failure recovery compared to Apache Kafka.

Cost: 10 Times Cost Difference

Based on the differences in technical architecture mentioned above, AutoMQ and Apache Kafka also have significant differences in the cost structures of computing and storage. AutoMQ eliminates the need for cross-node multiple replica copies during message writing, saving most of the traffic and pressure from cross-node replication. Additionally, AutoMQ uses S3 object storage as the storage medium, which is substantially cheaper than EBS block storage mounted on each node in a typical public cloud environment.

Specific comparison items are as follows:

Cost Comparison
Apache Kafka
AutoMQ
Storage Unit Price
  • Scenario: 1GB of data requires 3GB of EBS (three replicas)
  • Cost: 0.288 USD/month
  • Scenario: 1GB of business data requires 1GB of S3
  • Cost: 0.023 USD/month
Cross-node Replication Traffic
  • Scenario: Writing 1GB of data, requiring cross-node replication of 2GB traffic (three replicas)
  • Cost: 0.04 USD
  • Scenario: Writing 1GB data, direct upload to S3, no cross-node traffic required (three replicas)
  • Cost: 0 USD

Notes:

The storage unit prices listed above compare the AWS S3 US East EBS GP3 instance with S3 Standard Storage, with more details available at reference link.

The cost for cross-node replication traffic is demonstrated using AWS AZ inter-zone traffic transfer rates.

For an in-depth cost comparison between AutoMQ and Apache Kafka®, refer to Cost-Effective: AutoMQ vs. Apache Kafka▸.

Capacity: Reserved Vs Pay-as-you-go

Capacity planning presents a significant challenge in large-scale Kafka deployments in production settings. Due to architectural variances and differences in storage mediums between AutoMQ and Apache Kafka®, capacity planning considerations differ:

Apache Kafka®
Uses local disks, integrates storage and computation
AutoMQ
Uses S3 Object storage, separates storage and computation
Disk space must be reserved in advance
Storage space requires no reservation and operates on a pay-as-you-go basis
Single-node storage is limited, with poor scalability
S3 object storage offers nearly limitless space and excellent scalability

| 100% Compatible with Apache Kafka®

AutoMQ, as a next-generation redesign of Kafka, guarantees 100% compatibility with Apache Kafka® while offering a cost-effective and flexible alternative. Applications developed on Apache Kafka® can be directly migrated to AutoMQ without any need for modifications.

As outlined in the architecture comparison above, AutoMQ introduces the S3Stream storage repository at the storage layer, replacing Apache Kafka’s® local log storage. This adjustment maintains the same Partition abstraction, enabling upper layers like KRaft metadata management, Coordinator, ReplicaManager, KafkaApis, etc., to leverage existing code logic. Therefore, AutoMQ upholds full compatibility with Apache Kafka’s® protocols and semantics, while continuously integrating the latest features and bug fixes from Apache Kafka®.

For further details on the compatibility between AutoMQ and Apache Kafka®, refer to Compatibility with Apache Kafka▸.