Skip to Main Content

Overview

Apache Kafka® was created in 2011, designed for traditional data centers. It leveraged the classic Shared Nothing architecture to address horizontal scalability issues and gradually evolved to the Tiered Storage architecture to benefit from the cost advantages of cloud storage. Today, AutoMQ introduces a more refined Shared Storage architecture that fully harnesses the benefits of cloud-native environments, offering ten times the cost advantage and a hundredfold increase in operational efficiency compared to Apache Kafka®.

Kafka Architecture Evolution

Shared Nothing Architecture

The Shared Nothing architecture, as the most classic design of Apache Kafka®, addresses scalability issues in distributed storage software within traditional data center environments by integrating storage and computation. Kafka employs a replication mechanism based on ISR[1] to ensure data durability and system availability.

With the maturation of cloud computing, the demand for elasticity has emerged. However, the classic Shared Nothing architecture cannot meet this requirement. When scaling up Apache Kafka® Broker nodes, a substantial amount of data replication is needed to complete partition reassignment, often taking several hours.

On the other hand, Apache Kafka® requires triple replication. When deploying on the cloud, users have two storage options:

  • Choosing cloud storage EBS as the storage medium for the Broker, although expensive. The three-replica mechanism of EBS combined with ISR replication leads to data being stored nine times, resulting in significant storage space wastage.

  • Choosing local disks as the storage medium for the Broker, which is relatively cost-effective. However, users face high operational costs, and the advantages of moving to the cloud are diminished.

Tiered Storage Architecture

As cloud computing matures and scales up, the first to benefit is object storage. With its low storage costs and pay-as-you-go pricing model, object storage has driven a significant evolution of storage software towards a Tiered Storage architecture.

As the name suggests, this architecture adds a secondary storage tier. After data is stored in the primary storage, it is asynchronously transferred to the secondary storage. This architecture can leverage the cost advantages of object storage to some extent while alleviating the elasticity issues of a Shared Nothing architecture.

However, the Tiered Storage architecture does not fundamentally solve the problems of Apache Kafka, for several reasons:

  • The space consumed by primary storage can be reduced, but the extent of reduction varies by scenario and still requires rigorous capacity evaluation. The high EBS costs incurred by ISR replication cannot be completely eliminated.

  • The issue of slow scaling remains; data in primary storage needs to be migrated and replicated during scaling operations, which can take hours instead of tens of hours with optimization.

Simply put, the primary storage in a Tiered Storage architecture does not fundamentally differ from a Shared Nothing architecture. Aside from reduced space, the partition storage layout in the file system and the ISR replication mechanism remain unchanged.

Shared Storage Architecture

AutoMQ's introduction of the Shared Storage architecture has completely replaced Apache Kafka's storage layer, innovatively offloading data to cloud storage and transforming the Broker into a stateless entity. In this architecture:

  • Data is persistently written to raw EBS devices using Direct IO and then uploaded to object storage in near real-time.

  • EBS acts as shared WAL storage in this architecture, used solely for recovery pathways, requiring only a minimal-sized EBS volume, such as 10 GiB.

  • EBS is often misunderstood as local disk storage, but in fact, EBS is a fully shareable cloud storage service. AutoMQ innovatively uses multi-attach and NVMe protocol's PR lock mechanism to employ EBS as shared WAL storage.

AutoMQ Shared Storage Architecture Overview

The Shared Storage architecture introduced by AutoMQ follows the principle of separating storage and compute. By offloading storage to cloud services, it makes the computation layer completely stateless, rendering the entire architecture elastic and fully exploiting cloud-native advantages.

  • Shared Storage Layer: AutoMQ selects the largest object storage and EBS storage from cloud providers as the storage medium for the shared storage layer. EBS serves as shared WAL storage, while object storage functions as the primary data store.

  • Stateless Compute Layer: AutoMQ replaces Apache Kafka's native Log storage with its self-developed stream storage library—S3Stream, offloading Broker storage to cloud storage. This leverages EBS's high performance, object storage's low cost, and high throughput characteristics, making the compute layer stateless.

  • Control Layer: After fully offloading the storage state, AutoMQ makes it incredibly easy to achieve second-level partition reassignment, automatic scaling, and continuous self-balancing of traffic. Consequently, AutoMQ has built-in Controller components within its core, such as the Auto Scaling and Auto Balancing components, which are responsible for cluster scaling and traffic self-balancing, respectively.

References

[1]. Kafka ISR Replication Mechanism: https://kafka.apache.org/documentation/#replication

[2]. WAL Wiki:https://en.wikipedia.org/wiki/Write-ahead_logging

[3]. AWS EBS Multi-Attach: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volumes-multi.html

[4]. NVMe Reservations:https://aws.amazon.com/blogs/aws/new-nvme-reservations-for-amazon-elastic-block-store-io2-volumes/