Stateless Broker
AutoMQ utilizes the separation of storage and compute to offload Kafka's storage layer to cloud storage via S3Stream, making Broker nodes stateless. Stateless Brokers have significant advantages in operations and scalability. Additionally, stateless AutoMQ can be deployed using Spot instances on the cloud, further reducing computing costs.
How AutoMQ Achieves Complete Statelessness
S3Stream comprises two storage components:
WAL Storage: The storage medium for WAL Storage is diverse; it can use EBS for WAL, or S3 for WAL.
S3 Storage: Object storage is used as the primary storage for data.
S3Stream accesses object storage via the HTTP protocol, which is entirely stateless. Therefore, if S3 is chosen as the storage medium for WAL when deploying AutoMQ, the entire architecture is completely stateless. However, if EBS is chosen for WAL and accessed through the file API, achieving a stateless architecture becomes challenging.
The core of AutoMQ is using EBS's multi-attach capability. By turning EBS into shared storage, complete statelessness can be achieved. The core process is straightforward:
Upon detecting the failure of Broker A, the Controller will attach its EBS WAL to Broker B in a multi-attach manner.
Broker B will take over and complete the recovery upload of the small amount of data in the WAL that is not stored in S3.
At this point, the state of Broker A is offloaded, and the Controller will subsequently reassign the partitions originally belonging to Broker A evenly to other Brokers.
The above process design can be applied to fault recovery scenarios as well as scaling down and decommissioning processes, constituting a fully stateless design.
Advantages of Statelessness
Storage software that adopts a storage-compute integrated architecture is generally stateful software, facing significant challenges in operations, scaling, and downsizing. AutoMQ transforms Apache Kafka® into stateless storage software, making operating AutoMQ as simple as managing a microservice application.
Simplified Operations: For AutoMQ, daily operations become sufficiently simple. After a Broker node shuts down, its state is completely transferred, with clients entirely unaffected. Operations personnel have ample time to decide whether the shut-down Broker needs to be brought back online or decommissioned permanently. Cluster upgrades can also be completed at low cost and risk through rolling updates.
Automatic Scaling: Stateless AutoMQ can scale up or down freely, similar to a microservice application or a Kubernetes Deployment, achieving true auto-scaling and saving significant costs.
Use of Spot Instances: Cloud providers offer Spot instances that can be up to 90% cheaper than regular virtual machines. However, due to the nature of Spot instances being subject to termination at any time, only stateless applications can take advantage of them.