Skip to Main Content

Stateless Broker

AutoMQ leverages the disaggregation of storage and compute to offload Kafka's storage layer to cloud storage via S3Stream, making the Broker nodes stateless. Stateless Brokers offer significant advantages in terms of operations and scalability. Additionally, a stateless AutoMQ can be deployed on Spot instances in the cloud, further reducing compute costs.

How AutoMQ Achieves Complete Statelessness

S3Stream consists of two storage components:

  • WAL storage, which utilizes various storage media, including EBS or S3 for WAL.

  • S3 storage, which uses object storage as the primary data store.

S3Stream accesses object storage via HTTP protocol, offering a completely stateless access method. Therefore, if S3 is chosen for WAL storage when deploying AutoMQ, the entire architecture remains completely stateless. However, choosing EBS for WAL and accessing it through the file API poses challenges for achieving a stateless architecture.

AutoMQ's core solution is to leverage EBS's multi-attach capability to transform EBS into shared storage, thereby achieving complete statelessness. The core process is straightforward:

  • When the Controller detects a failure in Broker A, it mounts Broker A's EBS WAL to Broker B using multi-attach.

  • Broker B will complete the recovery upload of the small amount of data not yet stored on S3 from the WAL.

  • At this point, Broker A's state is fully unloaded, and the Controller will evenly redistribute the partitions originally belonging to Broker A to other Brokers.

This process design can be applied to fault recovery scenarios as well as scaling down and decommissioning processes, representing a fully stateless design.

Advantages of Statelessness

Storage software that adopts a tightly-coupled storage-compute architecture is generally stateful, facing significant challenges in operations, scaling, and shrinking. By transforming Apache Kafka into a stateless storage software, AutoMQ makes operating AutoMQ as simple as operating a microservice application.

  • Simplified Operations: For AutoMQ, daily operations become straightforward. After a Broker node is shut down, its state is completely transferred without impacting clients, giving operations personnel ample time to decide whether the shutdown Broker needs to be brought back online or permanently decommissioned. Cluster upgrades can also be performed cost-effectively and with minimal risk through rolling updates.

  • Auto Scaling: Stateless AutoMQ can be scaled up or down as freely as a microservice application or a Kubernetes Deployment, achieving true auto-scaling and saving significant costs.

  • Utilization of Spot Instances: Cloud providers offer Spot instances, which can be up to 90% cheaper than standard virtual machines. However, due to the ephemeral nature of Spot instances, only stateless applications can take full advantage of them.