WAL Storage
In the stream storage repository S3Stream, WAL (Write-Ahead Logging) is one of the core components with two main purposes:
Providing low-latency, high-performance data persistence writes. Once the data is successfully written to WAL, confirmation is returned to the client.
During a Broker node failure that requires failover, it recovers data from WAL that has not been timely uploaded to S3.
WAL does not undertake regular data consumption duties, so its requirements for storage medium IOPS are not high, focusing more on cost, latency, and throughput.
In different runtime environments, AutoMQ offers various storage medium implementations of WAL, including EBS WAL, Regional EBS WAL, and S3 WAL.
EBS WAL
The block storage service EBS provided by cloud providers is the best storage medium for implementing WAL, featuring the following characteristics:
Low latency, generally providing sub-millisecond IO latency or even lower, capable of meeting all Kafka business scenario latency requirements.
Built-in multi-replica mechanism, offering high data durability, typically around five nines, aligning well with AutoMQ’s durability separation philosophy and forming the basis for AutoMQ not requiring additional multi-replica mechanisms.
Low cost, S3Stream requires only a few GiB of space on the WAL disk. A 10GiB EBS can deliver around 120MiB/s of throughput and approximately 3000 IOPS, but the cost is only a few dollars per month.
The WAL storage of S3Stream is optimized for write throughput and latency [1], with the following write characteristics:
Centralized writes, unlike Apache Kafka®, AutoMQ does not need to write separate log files for each partition. By mixing the data from all partitions into the WAL, it supports efficient writing in scenarios with a large number of partitions.
Raw device writes, AutoMQ only needs to write one file. EBS can be used as a raw device for writing, eliminating the need to mount a file system and avoiding the additional overhead brought by the file system, thereby achieving optimal performance and latency.
Sequential writes and group commits, data is written sequentially into the WAL, combined with the group commit mechanism, requiring only a small amount of IOPS to achieve high-throughput writes.
Direct IO writes, AutoMQ bypasses the kernel's Page Cache, writing data directly to EBS. Once the write is successful, it returns confirmation immediately, unaffected by Page Cache dirty page reclamation, which is the foundation of AutoMQ's stable low-latency performance.
Regional EBS WAL
Although EBS has built-in multi-replica mechanisms, these replicas are often distributed within the same availability zone, meaning that in the event of an availability zone-level failure, a small amount of data on the WAL may be at risk of being unrecoverable.
Fortunately, more and more cloud providers, such as Azure [2], GCP [3], and Alibaba Cloud [4], are starting to offer Regional EBS products. These products typically have replicas distributed across three availability zones, providing high data durability and availability.
Using Regional EBS as the WAL implementation for S3Stream avoids data replication across availability zones for AutoMQ, thus saving a significant amount of cross-zone traffic costs.
S3 WAL
In addition to using various EBS storage media as WAL storage, AutoMQ can also implement WAL using object storage APIs. Adopting S3 WAL can completely eliminate the dependency on EBS, transforming the entire architecture to be directly based on S3, with the following advantages:
Reduced dependencies and simplified operations.
Different cloud providers offer varying storage services, and deploying AutoMQ on cloud providers where EBS capabilities are insufficient can leverage S3 WAL.
In many Private Cloud scenarios, only S3 implementations may be available, making AutoMQ more universally applicable with S3 WAL.
However, the disadvantages of S3 WAL are also quite apparent:
S3 WAL has higher latency, reaching several hundred milliseconds, making it suitable for use cases such as logging, observability, and offline computing.
Compared to EBS WAL, writing to S3 WAL will consume additional network egress bandwidth, which lowers the peak read and write throughput that Broker nodes can handle, thereby increasing the overall cost.
The S3 WAL form of AutoMQ is particularly suitable for applications that can tolerate latencies in the hundred-millisecond range, enabling them to fully benefit from a simpler shared storage architecture.
S3 Express WAL
Although typical object storage services have higher latencies, some cloud providers offer low-latency S3 products, such as AWS with S3 Express One Zone and Azure with Premium Blob Storage. These solutions can achieve single-digit millisecond latencies, meeting the low-latency requirements of Kafka applications.
Additionally, for self-hosted object storage scenarios such as MinIO, configuring high-speed storage media can also deliver low-latency object storage deployments.
When the environment provides a low-latency S3 implementation, AutoMQ can fully adopt the S3 WAL form to eliminate dependence on EBS, making the architecture simpler while satisfying the low-latency requirements of applications.
Multi WAL
In the WAL storage implementation of S3Stream, multithreading technology has already been used to significantly increase IO queue depth, allowing single WAL to achieve GiB-level bandwidth writes.
However, to reduce EBS costs, AutoMQ recommends using small disks for WAL storage. This is because cloud providers offer 120MiB/s throughput for even the lowest-spec WAL disks, but to gain additional throughput, additional EBS capacity must be purchased. For example, on Alibaba Cloud, two 20GiB PL1 ESSDs can provide 260MiB/s bandwidth at a monthly cost of 40 RMB. In contrast, a single ESSD would require 280GiB of storage to achieve the same throughput, resulting in a sevenfold cost difference.
Therefore, combining multiple small disks to form a multi-WAL solution is the most cost-effective approach on Public Cloud. This allows for a low-cost increase in the throughput limit per node, thereby controlling the number of nodes in a large-scale AutoMQ cluster within a certain range.
Given that AWS's S3 Express and EBS cloud storage can only be distributed within a single availability zone, AutoMQ also supports combining EBS WAL and S3 Express WAL to provide a low-latency, multi-availability zone, and cost-effective architecture on AWS.
References
[1]. How AutoMQ Achieves High-Performance WAL on Bare Metal: https://mp.weixin.qq.com/s/rPBOFyVXbmauj-Yjy-rkbg
[2]. Azure Regional EBS: https://learn.microsoft.com/en-us/azure/virtual-machines/disks-redundancy#zone-redundant-storage-for-managed-disks
[3]. GCP Regional EBS: https://cloud.google.com/compute/docs/disks/regional-persistent-disk
[4]. Alibaba Cloud Regional EBS: https://developer.aliyun.com/special/live/regionalessd_bdrc
[5]. AWS S3 Express One Zone: https://aws.amazon.com/s3/storage-classes/express-one-zone/
[6]. Azure Premium Blob Storage: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-block-blob-premium
[7]. Alibaba Cloud ESSD Disk Specifications: https://help.aliyun.com/ecs/user-guide/essds