Skip to Main Content

WAL Storage

In the S3Stream storage repository, the WAL (Write-Ahead Log) is one of the core components and serves two primary purposes:

  • Providing low-latency, high-performance data persistence. Once data is successfully written to the WAL, a confirmation is immediately returned to the client.

  • During Broker node failover, the WAL is used to recover data that has not been uploaded to S3 in time.

The WAL does not handle regular data consumption, so it does not require high IOPS from the storage medium. Instead, it focuses on cost, latency, and throughput as the key metrics.

AutoMQ provides multiple WAL implementations using different storage media for various operating environments, including EBS WAL, Regional EBS WAL, and S3 WAL.

EBS WAL

The block storage service EBS offered by cloud providers is the optimal storage medium for implementing the WAL, featuring the following characteristics:

  • Low latency, typically providing sub-millisecond IO latency or even lower, which meets the latency requirements of all Kafka business scenarios.

  • Built-in multi-replica mechanism providing high data durability, generally around 99.999% (five nines), aligning well with AutoMQ's durability separation philosophy and serving as the foundation for AutoMQ not needing to implement additional multi-replica mechanisms.

  • Low cost, S3Stream requires only a few GiB of space on the WAL disk. A 10GiB EBS can provide around 120MiB/s throughput and approximately 3000 IOPS, but the cost is only a few dollars per month.

The WAL storage in S3Stream is optimized for write throughput and latency, featuring the following write characteristics:

  • Centralized writing: Unlike Apache Kafka, AutoMQ does not require writing separate log files for each partition. By mixing data from all partitions and writing it into the WAL, efficient writing is supported even in large-scale partition scenarios.

  • Raw device writing: AutoMQ only needs to write to a single file and can use EBS as a raw device without mounting a file system, avoiding the additional overhead associated with file systems. This ensures optimal performance and latency.

  • Sequential writing and group commit: Data is written sequentially to the WAL, and with the group commit mechanism, high-throughput writing can be achieved with minimal IOPS.

  • Direct IO writing: AutoMQ bypasses the kernel's Page Cache, directly writing data to EBS. Confirmation is returned as soon as the data is successfully written, unaffected by Page Cache dirty page reclamation. This feature is fundamental to AutoMQ's stable low-latency performance.

Regional EBS WAL

Although EBS includes a built-in multi-replica mechanism, these replicas are often distributed within the same availability zone. This means that in the event of an availability zone-level failure, a small amount of data on the WAL may be at risk of being unrecoverable.

Fortunately, more cloud providers, such as Azure, Google Cloud, and Alibaba Cloud, are starting to offer Regional EBS products. These products typically have multiple replicas distributed across three availability zones, providing high data durability and availability.

Using Regional EBS as the WAL implementation for S3Stream avoids the need for AutoMQ to replicate data across availability zones, thus saving a significant amount on cross-AZ traffic costs.

S3 WAL

In addition to using various EBS storage mediums as WAL storage, AutoMQ can also leverage object storage APIs to implement WAL. Adopting S3 WAL can completely eliminate the dependency on EBS, transforming the entire architecture to be based directly on S3. The advantages of this approach include:

  • Reduced dependencies, making operations sufficiently simple.

  • Different cloud providers offer varying storage services, so when deploying AutoMQ on a cloud provider where EBS capabilities are lacking, S3 WAL can be used.

  • In many Private Cloud scenarios, only S3 implementations may be available. Using S3 WAL makes AutoMQ more versatile.

However, the disadvantages of S3 WAL are also evident:

  • S3 WAL has higher latency, reaching up to several hundred milliseconds, making it suitable for logging, observability, and offline computation scenarios.

  • Compared to EBS WAL, writing to S3 WAL consumes additional network egress bandwidth, which reduces the peak read/write throughput that Broker nodes can handle, thereby increasing the overall cost.

The S3 WAL form of AutoMQ is particularly suitable for workloads that can tolerate latency at the level of hundreds of milliseconds, allowing them to fully benefit from a simpler shared storage architecture.

S3 Express WAL

Although typical object storage services have higher latency, some cloud providers offer low-latency S3 products. For instance, AWS provides S3 Express One Zone, and Azure offers Premium Blob Storage, both achieving single-digit millisecond latency, capable of meeting the low-latency requirements of Kafka workloads.

Additionally, for self-hosted object storage scenarios like MinIO, configuring high-speed storage media can also deliver low-latency object storage deployments.

When the environment provides low-latency S3 implementations, AutoMQ can fully adopt the S3 WAL form, eliminating the dependency on EBS and simplifying the overall architecture while meeting the business's low-latency requirements.

Multiple WALs

In the WAL storage implementation of S3Stream, multithreading technology has been employed to significantly increase IO queue depth, enabling GiB-level bandwidth write based on a single WAL.

However, to reduce EBS costs, AutoMQ recommends using small disks for WAL storage. This is because cloud providers offer 120MiB/s throughput even for the smallest WAL disk. To obtain additional throughput, purchasing extra EBS capacity is necessary. For example, in Alibaba Cloud, two 20GiB PL1 ESSDs provide 260MiB/s bandwidth at a monthly cost of 40 RMB. In contrast, using a single ESSD would require 280GiB of storage to achieve the same throughput, leading to a sevenfold cost difference.

Therefore, combining multiple small disks to form a multi-WAL solution is the most cost-effective approach on Public Cloud, allowing for a cost-efficient increase in the single-node throughput limit, thus keeping the node count of a large-scale AutoMQ cluster within a manageable range.

Given that AWS S3 Express and EBS cloud storage can only be distributed within a single availability zone, AutoMQ also supports the combination of EBS WAL and S3 Express WAL to provide a low-latency, multi-availability zone, and cost-effective architecture on AWS.

References

[1]. How AutoMQ Achieves High-Performance WAL on Bare Devices: https://mp.weixin.qq.com/s/rPBOFyVXbmauj-Yjy-rkbg

[2]. Azure Regional EBS: https://learn.microsoft.com/en-us/azure/virtual-machines/disks-redundancy\#zone-redundant-storage-for-managed-disks

[3]. GCP Regional EBS: https://cloud.google.com/compute/docs/disks/regional-persistent-disk

[4]. Alibaba Cloud Regional EBS: https://developer.aliyun.com/special/live/regionalessd_bdrc

[5]. AWS S3 Express One Zone: https://aws.amazon.com/s3/storage-classes/express-one-zone/

[6]. Azure Premium Blob Storage: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-block-blob-premium

[7]. Alibaba Cloud ESSD Disk Specifications: https://help.aliyun.com/ecs/user-guide/essds