CubeFS

Preface

CubeFS [1] is a pioneering cloud-native storage solution, currently in its incubation phase under the auspices of the Cloud Native Computing Foundation (CNCF). It supports a variety of access protocols including S3, POSIX, and HDFS, and provides two storage engines: replication and erasure coding. CubeFS offers functionalities like multi-tenancy, multi-AZ deployment, and cross-region replication, making it ideal for use in diverse scenarios such as big data, AI, container platforms, databases, middleware storage-compute separation, data sharing, and data protection.

AutoMQ leverages a cutting-edge shared storage architecture that benefits from cost-effective object storage, and CubeFS's support for an S3-compatible interface is a perfect match. The ObjectNode feature in CubeFS offers an S3-compatible object storage interface to manage files, facilitating the use of tools like S3Browser, S3Cmd, or the native Amazon S3 SDK. Consequently, AutoMQ can be seamlessly integrated with CubeFS to create a streaming system that not only aligns with Kafka’s capabilities but also provides enhanced cost efficiency, ultimate scalability, and sub-millisecond latency.

This article explores the steps to deploy the AutoMQ cluster in a private data center using CubeFS.

Prerequisites

Prepare a CubeFS Cluster

An existing CubeFS environment. If you have not set up CubeFS yet, please consult the official documentation for dependency configuration [3] and establishing a basic CubeFS cluster [4].

The standard installation package of CubeFS includes a build/bin directory filled with command-line tools essential for cluster management. This article will utilize these tools for further configurations.

Verify the cluster status with the CubeFS command line tool to confirm a successful setup:

# Run the Command
./build/bin/cfs-cli cluster info

# Review the Output
[Cluster]
  Cluster name       : cfs_dev
  Master leader      : 172.16.1.101:17010
  Master-1           : 172.16.1.101:17010
  Master-2           : 172.16.1.102:17010
  Master-3           : 172.16.1.103:17010
  Auto allocate      : Enabled
  MetaNode count (active/total)    : 4/4
  MetaNode used                    : 0 GB
  MetaNode available               : 21 GB
  MetaNode total                   : 21 GB
  DataNode count (active/total)    : 4/4
  DataNode used                    : 44 GB
  DataNode available               : 191 GB
  DataNode total                   : 235 GB
  Volume count       : 2
...

Note: The IP and port of the master node in the CubeFS cluster are needed for the upcoming object gateway configuration.

Activate the Object Gateway

To enable CubeFS's support for the object storage protocol, activate the object gateway [5]. The object gateway provides an S3-compatible interface, allowing CubeFS to support both the traditional POSIX file system and the S3-compatible object storage interface. This dual-interface capability offers users a versatile data storage and access solution. Specifically, once the object gateway is enabled, users can manage files in CubeF using the native Amazon S3 SDK, leveraging the benefits of object storage.

Begin by creating an objectnode.json configuration file in the CubeFS root directory. Here's an example of what the objectnode.json configuration file might look like:

{
     "role": "objectnode", 
     "listen": "17410",
     "domains": [
         "object.cfs.local"
     ],
     "logDir": "/cfs/Logs/objectnode",
     "logLevel": "info",
     "masterAddr": [
         "172.16.1.101:17010",
         "172.16.1.102:17010",
         "172.16.1.103:17010"
     ],
     "exporterPort": 9503,
     "prof": "7013"
}

Note: The masterAddr's IP and port details can be sourced from the CubeFS cluster information mentioned earlier.

Then, use the following command to initiate the object gateway:

nohup ./build/bin/cfs-server -c objectnode.json &

Create a CubeFS User

Create a CubeFS user and retrieve the AccessKey and Secret AccessKey details.

For creating and querying user information, refer to the User Management Documentation [6].

CubeFS supports a variety of creation methods, including using the AWS SDK [7] or through an HTTP request. Here, we will demonstrate the process using an HTTP request:

Specify the user ID, password, and type, and then access the creation interface:

curl -H "Content-Type:application/json" -X POST --data '{"id":"automq","pwd":"12345","type":3}' "http://172.16.1.101:17010/user/create"

Query user information using the user ID:

curl -v "http://10.196.59.198:17010/user/info?user=automq" | python -m json.tool

Example response:

{
     "user_id": "automq",
     "access_key": "UZONf5FF6WKwFCj4",
     "secret_key": "TRZzfPitQkxOLXqPhKMBRrDYUyXXMpWG",
     "policy": {
         "own_vols": ["vol1"],
         "authorized_vols": {
             "ltptest": [
                 "perm:builtin:ReadOnly",
                 "perm:custom:PutObjectAction"
             ]
         }
     },
     "user_type": 3,
     "create_time": "2024-06-06 09:25:04"
}

Create a Bucket Using the S3 Interface

Use the AWS CLI tool on CubeFS to create the necessary bucket for AutoMQ cluster deployment.

Obtain the user's key and other details, configure them using aws configure, and create the bucket using the AWS CLI tool.

aws s3api create-bucket --bucket automq-data --endpoint=http://127.16.1.101:17140
aws s3api create-bucket --bucket automq-ops --endpoint=http://127.16.1.101:17140

Use commands to view existing buckets

aws s3 ls --endpoint=http://172.16.1.101:17140

Prepare the Machines Required for AutoMQ Deployment

Prepare 5 hosts for deploying the AutoMQ cluster, ideally selecting Linux amd64 hosts with 2 cores and 16GB of memory, and equip them with two virtual storage volumes. Example as follows:

Role	IP	Node ID	System Volume	Data Volume
CONTROLLER	192.168.0.1	0	EBS 20GB	EBS 20GB
CONTROLLER	192.168.0.2	1	EBS 20GB	EBS 20GB
CONTROLLER	192.168.0.3	2	EBS 20GB	EBS 20GB
BROKER	192.168.0.4	3	EBS 20GB	EBS 20GB
BROKER	192.168.0.5	4	EBS 20GB	EBS 20GB

Tips:
Ensure these machines are in the same subnet and can communicate with each other
In non-production settings, you can deploy a single Controller, which defaults to serving both the Controller and Broker roles.

Download the latest official binary package from AutoMQ Github Releases to install AutoMQ.

Install and Start the AutoMQ Cluster.

Configure S3URL.

Step 1: Generate an S3 URL.

AutoMQ provides the automq-kafka-admin.sh tool, which enables the rapid deployment of AutoMQ. Just supply an S3 URL with the necessary access points and authentication details, and you can launch AutoMQ using a single command without manually generating a cluster ID or formatting storage.

### Command-line Usage Example.
bin/automq-kafka-admin.sh generate-s3-url \ 
--s3-access-key=xxx \ 
--s3-secret-key=yyy \ 
--s3-region=cn-northwest-1 \ 
--s3-endpoint=s3.cn-northwest-1.amazonaws.com.cn \ 
--s3-data-bucket=automq-data \ 
--s3-ops-bucket=automq-ops

If an error occurs, ensure the accuracy of the parameters and their format.

When using CubeFS, the following configuration can be used to generate a specific S3URL.

Parameter Name	Default Value in This Example	Description
--s3-access-key	XXX	Remember to replace this after creating a CubeFS user based on actual conditions
--s3-secret-key	YYY	Remember to replace this after creating a CubeFS user based on actual conditions
--s3-region	auto	This can be set to the cluster name, or auto
--s3-endpoint	http://host ip:17140	This parameter is the S3 access point for CubeFS
--s3-data-bucket	automq-data	CubeFS bucket name
--s3-ops-bucket	automq-ops	CubeFS bucket name

Output Result

After this command is run, the process will automatically move through the following stages:

Detect core features of S3 using the provided accessKey and secret Key to ensure compatibility between AutoMQ and S3.
Create an s3url using identity details and access point information.
Utilize the s3url to retrieve an example of the startup command for AutoMQ. In this command, replace --controller-list and --broker-list with the actual CONTROLLER and BROKER that need to be deployed.

The example of the execution result is as follows:

############ Ping S3 ########################

[ OK ] Write s3 object
[ OK ] Read s3 object
[ OK ] Delete s3 object
[ OK ] Write s3 object
[ OK ] Upload s3 multipart object
[ OK ] Read s3 multipart object
[ OK ] Delete s3 object
############ String of S3url ################

Your s3url is:

s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=xxx&s3-secret-key=yyy&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA


############ Usage of S3url ################
To start AutoMQ, generate the start commandline using s3url.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093"  \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

TIPS: Please replace the controller-list and broker-list with your actual IP addresses.

Step 2: Generate a List of Startup Commands

In the command generated in the previous step, replace --controller-list and --broker-list with your host details, specifically with the IP addresses of the 3 CONTROLLERS and 2 BROKERS mentioned in the environment preparation, using the default ports 9092 and 9093.

bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093"  \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

Parameter Description

Parameter Name	Required	Description
--s3-url	Yes	Generated by the command line tool `bin/automq-kafka-admin.sh generate-s3-url`, includes authentication, cluster ID, and other information
--controller-list	Yes	Requires at least one address, serving as the IP and port list for the CONTROLLER host. Format: IP1:PORT1; IP2:PORT2; IP3:PORT3
--broker-list	Yes	Requires at least one address, serving as the IP and port list for the BROKER host. Format: IP1:PORT1; IP2:PORT2; IP3:PORT3
--controller-only-mode	No	Determines whether the CONTROLLER node only assumes the role of CONTROLLER. Defaults to false, meaning the deployed CONTROLLER node also acts as a BROKER.

Output Result

After executing the command, it generates the command needed to launch AutoMQ.

############ Start Commandline ##############
To start an AutoMQ Kafka server, please navigate to the directory where your AutoMQ tgz file is located and run the following command.

Before running the command, make sure that Java 17 is installed on your host. You can verify the Java version by executing 'java -version'.

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=1 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.2:9092,CONTROLLER://192.168.0.2:9093 --override advertised.listeners=PLAINTEXT://192.168.0.2:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=2 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.3:9092,CONTROLLER://192.168.0.3:9093 --override advertised.listeners=PLAINTEXT://192.168.0.3:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=3 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.4:9092 --override advertised.listeners=PLAINTEXT://192.168.0.4:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=4 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.5:9092 --override advertised.listeners=PLAINTEXT://192.168.0.5:9092

TIPS: Start controllers first and then the brokers.

The node.id is automatically generated starting from 0.

Step 3: Startup AutoMQ

To start the cluster, execute the command list from the previous step on the designated CONTROLLER or BROKER host. For example, to launch the first CONTROLLER process on 192.168.0.1, use the first command template from the generated startup command list.

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

Parameter Explanation

When using the startup command, parameters not specified will adopt Apache Kafka's default configuration. For parameters newly added by AutoMQ, AutoMQ's default values will be applied. To override the default configurations, you can append additional --override key=value parameters at the end of the command.

Parameter Name	Required	Description
s3-url	Yes	Generated by the bin/automq-kafka-admin.sh generate-s3-url command line tool, includes authentication, cluster ID, etc.
process.roles	Yes	Options are CONTROLLER or BROKER. If a host serves both as CONTROLLER and BROKER, the configuration value is CONTROLLER, BROKER.
node.id	Yes	An integer used to uniquely identify a BROKER or CONTROLLER within a Kafka cluster; it must remain unique within the cluster.
controller.quorum.voters	Yes	Information of hosts participating in the KRAFT election, including nodeid, ip, and port, for example: 0@192.168.0.1:9093, 1@192.168.0.2:9093, 2@192.168.0.3:9093
listeners	Yes	IP and port that are being listened to
advertised.listeners	Yes	The access address provided by the BROKER for Clients.
log.dirs	No	Directory where KRAFT and BROKER metadata are stored.
s3.wal.path	No	In production environments, it is recommended to store AutoMQ WAL data on a newly mounted bare device to achieve better performance, as AutoMQ supports writing data directly to bare devices, reducing latency. Ensure the path is correctly configured to store WAL data.
autobalancer.controller.enable	No	The default value is false, not enabling traffic self-balancing. Once traffic self-balancing is enabled, the auto balancer component of AutoMQ will automatically reassign partitions to ensure that overall traffic is balanced.

Tips:
For ongoing traffic self-balancing or to handle changes in cluster nodes, consider setting the parameter --override autobalancer.controller.enable=true when launching the Controller.
When deploying AutoMQ in a Private Cloud for production environments, it's crucial to ensure the reliability of local SSDs. CubeFS does not support protocols for high-availability block storage, and cannot directly manage disk redundancy or backup. Nevertheless, this can be addressed by implementing a RAID [8] solution.

Run in Background

To enable background mode, append the following code to the end of your command:

command > /dev/null 2>&1 &

With this, you have successfully deployed your AutoMQ cluster using CubeFS, which delivers a cost-effective, low-latency, and elastic Kafka cluster with sub-second scalability. For more insights into AutoMQ’s sub-second reassignment and self-balancing capabilities, refer to the official example.

References

[1] CubeFS: https://www.cubefs.io/

[2] CubeFS Multi-level Cache: https://www.cubefs.io/docs/master/overview/introduction.html

[3] Dependency Configuration: CubeFS | A Cloud Native Distributed Storage System

[4] CubeFS Single Node Deployment: www.cubefs.io

[5] Object Gateway: https://www.cubefs.io/docs/master/design/objectnode.html

[6] CubeFS User Management Documentation: CubeFS | A Cloud Native Distributed Storage System

[7] CubeFS AWS SDK: https://www.cubefs.io/docs/master/user-guide/objectnode.html\#%E6%94%AF%E6%8C%81%E7%9A%84sdk

[8] RAID: https://www.cnblogs.com/chuncn/p/6008173.html

CubeFS

Preface​

Prerequisites​

Prepare a CubeFS Cluster​

Activate the Object Gateway​

Create a CubeFS User​

Create a Bucket Using the S3 Interface​

Prepare the Machines Required for AutoMQ Deployment​

Install and Start the AutoMQ Cluster.​

Configure S3URL.​

Step 1: Generate an S3 URL.​

Output Result​

Step 2: Generate a List of Startup Commands​

Parameter Description​

Output Result​

Step 3: Startup AutoMQ​

Parameter Explanation​

Run in Background​

References​

Preface

Prerequisites

Prepare a CubeFS Cluster

Activate the Object Gateway

Create a CubeFS User

Create a Bucket Using the S3 Interface

Prepare the Machines Required for AutoMQ Deployment

Install and Start the AutoMQ Cluster.

Configure S3URL.

Step 1: Generate an S3 URL.

Output Result

Step 2: Generate a List of Startup Commands

Parameter Description

Output Result

Step 3: Startup AutoMQ

Parameter Explanation

Run in Background

References