Skip to Main Content

Overview

Apache Kafka® was born in 2011, designed for traditional data centers using a classic Shared Nothing architecture to address horizontal scalability issues. It has gradually evolved to a Tiered Storage architecture to leverage the cost benefits of cloud storage. Today, AutoMQ brings a more pure Shared Storage architecture that fully harnesses cloud-native advantages, offering ten times the cost advantage and a hundredfold increase in operational efficiency compared to Apache Kafka.

Kafka Architecture Evolution

Shared Nothing Architecture

The Shared Nothing architecture, as the most classic architecture of Apache Kafka, addresses the scalability issues of distributed storage software in traditional data center environments by integrating storage and computation. Kafka employs a replication mechanism based on ISR[1] to provide data durability and system availability.

As cloud computing matures, it drives the need for elasticity in business, but the classic Shared Nothing architecture fails to meet this demand. When expanding Apache Kafka's Broker nodes, a massive amount of data replication is required to complete partition reassignment, typically taking tens of hours.

On the other hand, Apache Kafka requires triple replication. When deploying on the cloud, users have two storage options:

  • Choosing cloud storage EBS as the storage medium for Brokers, but this is costly. The inherent triple replication mechanism of EBS combined with ISR replication leads to data being stored nine times, resulting in significant storage space wastage.

  • Choosing local disks as the storage medium for Brokers, which is relatively cost-controllable, but users incur high operational costs, negating the advantages of cloud deployment.

Tiered Storage Architecture

As cloud computing matures and scales, object storage is among the first to benefit. The low storage costs and pay-as-you-go model of object storage have driven many storage software solutions to evolve into a Tiered Storage architecture.

As the name implies, this architecture adds a secondary storage layer. Data is initially stored in primary storage and then asynchronously transferred to secondary storage. This architecture leverages the cost advantages of object storage while alleviating some elasticity issues associated with Shared Nothing architectures.

However, the Tiered Storage architecture does not fundamentally solve the problems of Apache Kafka®, primarily for several reasons:

  • While the space consumed by primary storage can be reduced, the extent of this reduction varies by scenario and still requires rigorous capacity planning. The high EBS costs brought by ISR replication cannot be entirely mitigated.

  • The issue of rapid scaling remains; data in primary storage still needs to be reassigned and replicated during scaling operations, which may take several hours instead of tens of hours with optimizations.

In summary, the primary storage in Tiered Storage architecture does not fundamentally differ from the Shared Nothing architecture. Apart from reduced space, the partition storage layout on the file system and the ISR replication mechanism remain unchanged.

Shared Storage Architecture

AutoMQ's introduction of the Shared Storage architecture completely replaces the storage layer of Apache Kafka®, innovatively offloading data to cloud storage, making Brokers stateless. In this architecture:

  • Data is persistently written to raw EBS devices using Direct IO and is then uploaded to object storage in near-real-time.

  • In this architecture, EBS serves as a shared WAL storage used only for the recovery path, requiring only a minimal EBS volume, such as 10 GiB.

  • EBS is often misunderstood as local disk storage, but in reality, EBS is a fully shareable cloud storage service. AutoMQ innovatively uses multi-attach and the NVMe protocol's PR lock mechanism to use EBS as shared WAL storage.

AutoMQ Shared Storage Architecture Overview

AutoMQ's Shared Storage architecture adheres to the concept of separating storage from computing. By offloading storage to cloud services, computing becomes entirely stateless, making the architecture elastic and fully leveraging the benefits of cloud-native environments.

  • Shared Storage Layer: AutoMQ selects the largest scale object storage and EBS storage from cloud providers as the storage media for the shared storage layer. In this setup, EBS serves as shared WAL storage, while object storage acts as the primary data store.

  • Stateless Computing Layer: By replacing Apache Kafka®'s native log storage with its proprietary stream storage repository, S3Stream, AutoMQ offloads Broker storage to cloud storage. This leverages the high performance of EBS, the low cost, and high throughput of object storage, thus making the computing layer stateless.

  • Control Layer: Once AutoMQ completely offloads the storage state, it becomes highly feasible to achieve second-level partition reassignment, automatic scaling, continuous self-balancing, and other features. As a result, AutoMQ has built-in Controller components within its kernel, such as Auto Scaling and Auto Balancing components, which are responsible for cluster scaling and traffic self-balancing, respectively.

References

[1]. Kafka ISR Replication Mechanism: https://kafka.apache.org/documentation/\#replication

[2]. WAL Wiki:https://en.wikipedia.org/wiki/Write-ahead_logging

[3]. AWS EBS Multi-Attach: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volumes-multi.html

[4]. NVMe Reservations:https://aws.amazon.com/blogs/aws/new-nvme-reservations-for-amazon-elastic-block-store-io2-volumes/