Skip to Main Content

Cloud Native Reshapes Kafka Architecture

Info

The term "AutoMQ Kafka" mentioned in this text specifically refers to the open-source project called automq-for-kafka, which is hosted under the GitHub organization AutoMQ, operated by AutoMQ CO.,LTD.

Opportunities for Project Creation

As the cloud becomes the infrastructure of modern IT, more and more enterprises are migrating their business and dependent underlying components to the cloud or building directly on the cloud. If moving to the cloud only involves converting products from physical machine deployment in self-built computer rooms to virtual machine deployment on the cloud, it may be difficult to benefit from the cloud's elastic capabilities, pay-as-you-go and low-cost large-scale cloud services.

Apache Kafka® was born as a large-scale log transmission distributed system. Apache Kafka® architecture is designed for data centers, ensuring data reliability through the ISR multi-copy mechanism, and persisting zero copies of data to local disks to achieve extremely high throughput. In an era when deploying applications in data centers is the mainstream, Apache Kafka® has sufficient bandwidth and cheap HDD disks in the computer room, providing users with a high-throughput and low-cost streaming transmission channel, and has become the leader in event streaming transmission channels in the field of real-time stream processing. de facto standard.

In the era of cloud, the most ideal working mode is that there is no need to reserve resources in advance according to the peak value, and it can be dynamically elastic pay-as-you-go according to the demand. Apache Kafka® 's data is bound to machines, and it takes hours or even a day to complete traffic rebalancing after expansion and contraction. This requires users to reserve resources for peak times even if they deploy Apache Kafka® to the cloud. In addition, cloud disks from mainstream Cloud providers have provided very high reliability. Take AWS EBS IO2 as an example. It itself provides 99.999% reliability. Apache Kafka® 's multi-copy mechanism does not add more reliability, but wastes storage and bandwidth, further increasing costs.

So it’s time to redesign a cloud-native Kafka based on the cloud!

True Cloud-native Applications

First, you need to define what a true cloud-native application is, and then you can design a cloud-native Kafka.

We believe that true cloud native applications have the following 2 characteristics:

Cloud Economy: Architecting Cloud-oriented Billing Items

  • Design for resource limitations: Resources on the cloud are limited. The architectural design of cloud native applications requires identifying and avoiding resource shortcomings. Resource limitations determine the theoretical upper limit. Taking Apache Kafka® as an example, in a scenario with three copies and a sending/consumption ratio of 1:1, the theoretical upper limit of a single AWS EC2 instance with m6in.large specification is (base bandwidth 390 MB/s) / (1 copy consumption Stream+ 2 copies Stream) = 130 MB/s. However, if data is stored based on cloud disks without additional copies, then the theoretical upper limit of a single m6in.large machine becomes (base bandwidth 390 MB/s) / (1 consumption outflow) = 390 MB/s, With the same machine specifications, the theoretical throughput is 3 times the original.
  • Make full use of free quotas from Cloud providers**: **Cloud services usually have free/preferential quotas for small sizes. Cloud native applications need to make full use of small sizes to greatly save costs and support large sizes through a combination of small sizes. scale services. For example, AWS GP3 cloud disk has a free quota of 3000 IOPS and 125MB/s. A GP3 with 10 free quotas can save ¥1,300 per month compared to a GP3 with 3W IOPS and 1250MB/s.
  • Building a storage system based on Object storage: There are hundreds of engineers behind AWS 's S3. After several years of continuous optimization, it is now one of the cheapest storages in the world and can guarantee ultra-high availability. The storage system chooses Object storage as the main storage, which can not only significantly save storage costs, but also reduce the complexity of the software by offloading data to Object storage.
  • Make full use of Spot instances:Major public clouds provide Spot instances that are cheaper than on-demand instances. The characteristic of Spot instances is that after running for a period of time, the public cloud will recycle or interrupt the Spot instances based on its own computing instance capacity or the Spot instance bidding situation. AutoMQ Kafka can make full use of Spot instances to bring up to 90% cloud computing instance cost savings.

Ultimate Flexibility: Pay-as-you-go

  • Minute-level scaling: All resources on the cloud are paid on a pay-as-you-go basis, and the billing unit can be as precise as the second. Cloud-native applications should also be pay-as-you-go. When traffic increases, capacity can be expanded in minutes, and when traffic decreases, unnecessary resources can be released in a timely manner.
  • Fine-grained elasticity: Cloud-native applications need to be designed for small specifications. Small specifications mean that the granularity of elasticity will be finer, and ultimately the cost of Serverless will be lower.

AutoMQ for Kafka - Truly Cloud-native Kafka

AutoMQ for Kafka (AutoMQ Kafka for short) is a cloud-native Kafka redesigned based on the cloud. AutoMQ Kafka is 100% compatible with the Kafka protocol and fully realizes the dividends of the cloud. Compared with Apache Kafka®, AutoMQ Kafka provides up to 10x cost savings and up to 800x efficiency improvements.

Architecture Overview

AutoMQ Kafka reconstructs the cloud native architecture based on Apache Kafka® :

  • Controller: The control plane continuously monitors the water level of the Broker and automatically initiates Partition traffic rebalancing among the Brokers.
  • Broker: "Stateless" computing node, all data is offloaded to shared storage S3Stream, and can be shut down and offline in seconds.
  • S3 Stream: S3 Stream is a low-latency, high-throughput, low-cost, unlimited storage streaming repository built on cloud disks and Object storage. The cloud disk is used as a write buffer [Delta WAL▸](/docs/automq-s3kafka/Q8fNwoCDGiBOV6k8CDSccKKRn9d# Delta-wal) (2GB Space) to Provide Millisecond-level Fast Response, and the Object Storage Is Used as the Primary Storage to Provide Unlimited Capacity and Low-cost Storage.

100% Protocol Compatible

AutoMQ Kafka makes minimal changes based on the Apache Kafka® code, starting from the smallest storage unit LogSegment code, mapping the file categories in the LogSegment to the corresponding S3Stream, and completing the offloading of Broker data to shared storage. Almost all codes in upper-layer logic such as network, transaction and Compact Topic are reused and kept consistent to achieve 100% protocol and semantic compatibility.

10x Cost Savings

Storage

Object storage is one of the cheapest storage on the cloud (AWS S3 storage price is only 28% of GP3). Object storage adopts a pay-as-you-go model and has almost unlimited capacity, but its read and write time is usually more than 100ms, and it is charged according to the number of API calls. If it is used directly to persist data, it will cause the user's writing time and API call costs to be too high. Therefore, we can choose a high-performance, low-latency storage as a buffer layer to convert high-frequency small data blocks into low-frequency large data blocks.

After investigating the solution of self-built multiple copies based on SSD, NAS and cloud disk as buffer layer, we finally decided that cloud disk is the best solution:

  • Self-built multiple copies based on SSD: Establishing 3 copies wastes 2 copies of Network Out traffic, which increases computing costs;
  • NAS: Under the same IOPS and throughput, cloud NAS has higher cost and write latency than cloud disks;
  • Cloud disk: The lowest cost and a high free quota on the cloud;

AutoMQ Kafka uses the cloud disk as the buffer layer. After the data written to S3Stream is persisted to the Delta WAL of the cloud disk, it can respond to the upper layer that the write is successful, and the Delta WAL provides high-performance, low-latency writing. When the Delta WAL operates 500MB, the background will asynchronously trigger the upload of Stream Set Object to the Object storage. The Object storage provides pay-as-you-go, low-cost, unlimited storage, realizing the pay-as-you-go of the storage layer.

Compared with Apache Kafka® (\<3.6.0 non-tiered storage), AutoMQ Kafka uses Object storage as the main storage model. Three copies on the cloud can save up to 10x the cost.

(Apache Kafka® storage cost: GP3 0.08$ GB per month* 3 replicas / 0.8 disk water level) / (AutoMQ Kafka storage cost: S3 0.023$ GB per month+ S3 call charges) ~= 10x

Calculate

Apache Kafka® is an IO-intensive application, and the network usually becomes the bottleneck first. AutoMQ Kafka writes data to Delta WAL based on the cloud disk. There is no need to go over the network to copy the data to other machines. The cloud disk ensures the high reliability of the data. Writing and reading from cloud disks are not counted in network consumption. AutoMQ Kafka background asynchronous uploading to Object storage consumes a share of traffic. Comparing Apache Kafka® three-copy and two-traffic fan-out replication, AutoMQ Kafka only consumes half of the traffic in the data persistence process. In a scenario where the send/consume ratio is 1:1, the single-machine throughput of AutoMQ Kafka is 1/((1 consumption+ 1 upload Object storage ) / (1 consumption+ 2 replication)\ of Apache Kafka® ) = 1.5x.

In addition to AutoMQ Kafka 's higher performance advantages at the same machine cost, AutoMQ Kafka 's Serverless minute-level elastic scaling is also a cost-saving weapon for businesses with traffic peaks and valleys.

  • Minute-level node scaling: AutoMQ Kafka continuously monitors the average network water level of the Broker through the Cloud providers's Auto Scaling Group. When the water level is higher/lower than the threshold, it will directly trigger the expansion and contraction of the cluster.
  • Second-level traffic rebalancing scheduling: The control plane of AutoMQ Kafka will continuously monitor the traffic between Brokers and complete Partition load balancing scheduling in seconds.

Through minute-level node expansion and second-level traffic rebalancing scheduling, AutoMQ Kafka can always maintain the cluster in a state that can exactly meet business needs, realizing pay-as-you-go at the computing layer.

For businesses with obvious traffic peaks and valleys, the Serverless form of AutoMQ Kafka combined with the new computing architecture can achieve a 10x computing cost saving.

800x Efficiency Improvement

AutoMQ Kafka is a "stateless" application. Although it seems that AutoMQ Kafka Broker has a cloud disk-based Delta WAL, in fact AutoMQ Kafka Broker is still "stateless".

Let’s explore how AutoMQ Kafka achieves “statelessness” through two scenarios: Broker’s minute-level scaling down and offline and Partition’s second-level migration:

  • Broker minute-level shrinkage:As mentioned earlier, Delta WAL will buffer data less than 500MB. When the Broker receives the kill -15 graceful shutdown signal, the Broker will upload the data in the buffer to the Object storage, and then complete the graceful shutdown. Usually machines on the cloud have network burst capabilities. Taking m6in.large as an example, the network burst performance is 25 Gbps, and Delta WAL upload can be completed in less than 2 seconds.
  • Partition second-level migration: Partition second-level migration is actually similar to Broker minute-level expansion and contraction. When a Partition is to be transferred to another Broker, the current Broker's Delta WAL upload will first be triggered, and then the corresponding data can be read when the target Broker opens the Partition.

Combined with Auto Scaling capacity detection to trigger automatic scaling of the cluster, AutoMQ Kafka can be Serverless and free of operation and maintenance.

Comparing Apache Kafka® and borrowing Confluent's example, in a scenario where three Brokers have a total storage of 777.6 TB, it takes 43 hours to complete rebalancing after expanding one Broker.

Let’s explore an example of an OSS Kafka cluster with three brokers [0, 1, 2]. The cluster retains messages for 30 days and receives 100 MBps on average from its producers. Given a replication factor of three, the cluster will retain up to 777.6 TB of storage, or about 259.2 TB per broker with equal balance.

Now when we scale the cluster by one broker, we need to bring the new machine up to speed. Assuming perfect data balancing, this means the new broker will be storing 194.4 TB (¾ of 259.2 TB), which will need to be read from the existing brokers. Assuming the full bandwidth was available for replication, it could take 43 hours using open source Kafka.

The operation and maintenance efficiency of AutoMQ Kafka expansion and contraction is Apache Kafka®: 1 / ((AutoMQ Kafka: 1min detection+ 1min expansion and contraction+ 1min traffic rebalancing) / (OSS Kafka: 43 hours rebalancing) ) = 860x.

Operation and maintenance personnel can operate and maintain AutoMQ Kafka like ordinary business applications, turning high-risk Apache Kafka® expansion and contraction, and traffic rebalancing operations into low-risk automated operation and maintenance.