Difference with Tiered Storage
AutoMQ uses object storage as its core storage service, while Apache Kafka® introduced a tiered storage solution in version 3.6.0 through KIP-405, leveraging object storage to offload historical data. This article will discuss the advantages and differences of AutoMQ compared to Apache Kafka's tiered storage version.
Architecture: EBS as Disk Vs EBS as Service
According to the design outlined in KIP-405, Apache Kafka's tiered storage version adopts a two-tier storage approach, relying on both local disk and object storage. Message data is initially written to the local disk and then asynchronously uploaded to object storage based on a cooling-off strategy. Since local disks are susceptible to failure, each message needs to be replicated across multiple disks on different nodes via the ISR mechanism to ensure durability.
Currently, when deploying the tiered storage version in a Public Cloud environment, Apache Kafka's architecture remains unchanged, still using EBS as a replacement for local disks, requiring messages to be replicated across multiple EBS instances. In summary, Apache Kafka still treats EBS as a standard block storage device, with no fundamental difference from a physical hard drive in a local data center.
Public Cloud providers offer EBS with high reliability and high availability guarantees. AutoMQ treats EBS as a cloud storage service, leveraging EBS's built-in 3-replica reliability (ranging from 5 nines to 9 nines) and its failover capabilities within and between availability zones. As a result, AutoMQ can avoid additional replication on top of EBS, saving significant storage, network, and computational resources.
Cost: EBS as Storage Vs EBS as Recovery WAL
In Apache Kafka's tiered storage architecture, the first tier of EBS storage is still used as the primary storage for read and write operations. Each Kafka partition must retain at least the latest active segment on the first tier storage. This leads to the following phenomenon:
EBS space is uncertain and directly related to the number of partitions in the cluster.
Reserving a large EBS space in the production environment is necessary to reduce risks.
EBS reservation costs are high, and the cost reduction potential through tiered storage is limited.
Example:
Taking the default configuration of Apache Kafka® as an example, with each segment size set to 1GB, if the number of active partitions is 1000, it still requires reserving 1TB of EBS.
In AutoMQ's architecture, object storage is used as the primary storage, and EBS is designated for fault recovery's WAL, implemented as a loop-through bare device write. Each AutoMQ Broker node only needs a 2GB EBS volume and can guarantee the temporary storage of approximately 500MB of data (the aforementioned space sizes are customizable).
This design ensures that AutoMQ's EBS space consumption is predictable, and the storage cost of EBS is extremely low, with a 2GB EBS volume costing only 1 CNY per month.
Elasticity: Hour-level Migration vs. Second-level Migration
Due to the non-fixed primary storage space in Apache Kafka's multi-tiered storage architecture, the data left on EBS for each partition is also non-fixed. Therefore, during operations like elastic scaling and fault reassignment, the time required is also uncertain, making quick scaling unachievable.
While AutoMQ's buffer only contains up to 500MB of data that needs to be uploaded to object storage, the upload can be completed within seconds, thereby supporting second-level partition reassignment.
In the case of Confluent, an expansion operation on a high-traffic cluster takes 43 hours in a non-tiered storage architecture and still requires 1.4 hours in a tiered storage architecture.
Summary
Compared to Apache Kafka's tiered storage solution, AutoMQ represents a qualitative leap driven by quantitative changes. Through architectural optimization, AutoMQ achieves a "stateless" state, allowing for arbitrary scaling and second-level partition reassignments. In contrast, Apache Kafka's tiered storage architecture remains an optimized yet stateful solution, making it challenging to achieve lightweight scaling and partition reassignment.