Skip to Main Content

CubeFS

Preface

CubeFS [1] is a pioneering cloud-native storage solution, currently in its incubation phase under the auspices of the Cloud Native Computing Foundation (CNCF). It supports a variety of access protocols including S3, POSIX, and HDFS, and provides two storage engines: replication and erasure coding. CubeFS offers functionalities like multi-tenancy, multi-AZ deployment, and cross-region replication, making it ideal for use in diverse scenarios such as big data, AI, container platforms, databases, middleware storage-compute separation, data sharing, and data protection.

AutoMQ leverages a cutting-edge shared storage architecture that benefits from cost-effective object storage, and CubeFS's support for an S3-compatible interface is a perfect match. The ObjectNode feature in CubeFS offers an S3-compatible object storage interface to manage files, facilitating the use of tools like S3Browser, S3Cmd, or the native Amazon S3 SDK. Consequently, AutoMQ can be seamlessly integrated with CubeFS to create a streaming system that not only aligns with Kafka’s capabilities but also provides enhanced cost efficiency, ultimate scalability, and sub-millisecond latency.

This article explores the steps to deploy the AutoMQ cluster in a private data center using CubeFS.

Prerequisites

Prepare a CubeFS Cluster

The standard installation package of CubeFS includes a build/bin directory filled with command-line tools essential for cluster management. This article will utilize these tools for further configurations.

Verify the cluster status with the CubeFS command line tool to confirm a successful setup:


# Run the Command
./build/bin/cfs-cli cluster info

# Review the Output
[Cluster]
Cluster name : cfs_dev
Master leader : 172.16.1.101:17010
Master-1 : 172.16.1.101:17010
Master-2 : 172.16.1.102:17010
Master-3 : 172.16.1.103:17010
Auto allocate : Enabled
MetaNode count (active/total) : 4/4
MetaNode used : 0 GB
MetaNode available : 21 GB
MetaNode total : 21 GB
DataNode count (active/total) : 4/4
DataNode used : 44 GB
DataNode available : 191 GB
DataNode total : 235 GB
Volume count : 2
...

Note: The IP and port of the master node in the CubeFS cluster are needed for the upcoming object gateway configuration.

Activate the Object Gateway

To enable CubeFS's support for the object storage protocol, activate the object gateway [5]. The object gateway provides an S3-compatible interface, allowing CubeFS to support both the traditional POSIX file system and the S3-compatible object storage interface. This dual-interface capability offers users a versatile data storage and access solution. Specifically, once the object gateway is enabled, users can manage files in CubeF using the native Amazon S3 SDK, leveraging the benefits of object storage.

Begin by creating an objectnode.json configuration file in the CubeFS root directory. Here's an example of what the objectnode.json configuration file might look like:


{
"role": "objectnode",
"listen": "17410",
"domains": [
"object.cfs.local"
],
"logDir": "/cfs/Logs/objectnode",
"logLevel": "info",
"masterAddr": [
"172.16.1.101:17010",
"172.16.1.102:17010",
"172.16.1.103:17010"
],
"exporterPort": 9503,
"prof": "7013"
}

Note: The masterAddr's IP and port details can be sourced from the CubeFS cluster information mentioned earlier.

Then, use the following command to initiate the object gateway:


nohup ./build/bin/cfs-server -c objectnode.json &

Create a CubeFS User

  • Create a CubeFS user and retrieve the AccessKey and Secret AccessKey details.

For creating and querying user information, refer to the User Management Documentation [6].

CubeFS supports a variety of creation methods, including using the AWS SDK [7] or through an HTTP request. Here, we will demonstrate the process using an HTTP request:

  • Specify the user ID, password, and type, and then access the creation interface:

curl -H "Content-Type:application/json" -X POST --data '{"id":"automq","pwd":"12345","type":3}' "http://172.16.1.101:17010/user/create"

  • Query user information using the user ID:

curl -v "http://10.196.59.198:17010/user/info?user=automq" | python -m json.tool

  • Example response:

{
"user_id": "automq",
"access_key": "UZONf5FF6WKwFCj4",
"secret_key": "TRZzfPitQkxOLXqPhKMBRrDYUyXXMpWG",
"policy": {
"own_vols": ["vol1"],
"authorized_vols": {
"ltptest": [
"perm:builtin:ReadOnly",
"perm:custom:PutObjectAction"
]
}
},
"user_type": 3,
"create_time": "2024-06-06 09:25:04"
}

Create a Bucket Using the S3 Interface

Use the AWS CLI tool on CubeFS to create the necessary bucket for AutoMQ cluster deployment.

Obtain the user's key and other details, configure them using aws configure, and create the bucket using the AWS CLI tool.


aws s3api create-bucket --bucket automq-data --endpoint=http://127.16.1.101:17140
aws s3api create-bucket --bucket automq-ops --endpoint=http://127.16.1.101:17140

Use commands to view existing buckets


aws s3 ls --endpoint=http://172.16.1.101:17140

Prepare the Machines Required for AutoMQ Deployment

Prepare 5 hosts for deploying the AutoMQ cluster, ideally selecting Linux amd64 hosts with 2 cores and 16GB of memory, and equip them with two virtual storage volumes. Example as follows:

Role
IP
Node ID
System Volume
Data Volume
CONTROLLER
192.168.0.1
0
EBS 20GB
EBS 20GB
CONTROLLER
192.168.0.2
1
EBS 20GB
EBS 20GB
CONTROLLER
192.168.0.3
2
EBS 20GB
EBS 20GB
BROKER
192.168.0.4
3
EBS 20GB
EBS 20GB
BROKER
192.168.0.5
4
EBS 20GB
EBS 20GB

Tips:

  • Ensure these machines are in the same subnet and can communicate with each other

  • In non-production settings, you can deploy a single Controller, which defaults to serving both the Controller and Broker roles.

Install and Start the AutoMQ Cluster.

Configure S3URL.

Step 1: Generate an S3 URL.

AutoMQ provides the automq-kafka-admin.sh tool, which enables the rapid deployment of AutoMQ. Just supply an S3 URL with the necessary access points and authentication details, and you can launch AutoMQ using a single command without manually generating a cluster ID or formatting storage.


### Command-line Usage Example.
bin/automq-kafka-admin.sh generate-s3-url \
--s3-access-key=xxx \
--s3-secret-key=yyy \
--s3-region=cn-northwest-1 \
--s3-endpoint=s3.cn-northwest-1.amazonaws.com.cn \
--s3-data-bucket=automq-data \
--s3-ops-bucket=automq-ops

If an error occurs, ensure the accuracy of the parameters and their format.

When using CubeFS, the following configuration can be used to generate a specific S3URL.

Parameter Name
Default Value in This Example
Description
--s3-access-key
XXX
Remember to replace this after creating a CubeFS user based on actual conditions
--s3-secret-key
YYY
Remember to replace this after creating a CubeFS user based on actual conditions
--s3-region
auto
This can be set to the cluster name, or auto
--s3-endpoint
http://host ip:17140
This parameter is the S3 access point for CubeFS
--s3-data-bucket
automq-data
CubeFS bucket name
--s3-ops-bucket
automq-ops
CubeFS bucket name

Output Result

After this command is run, the process will automatically move through the following stages:

  1. Detect core features of S3 using the provided accessKey and secret Key to ensure compatibility between AutoMQ and S3.

  2. Create an s3url using identity details and access point information.

  3. Utilize the s3url to retrieve an example of the startup command for AutoMQ. In this command, replace --controller-list and --broker-list with the actual CONTROLLER and BROKER that need to be deployed.

The example of the execution result is as follows:


############ Ping S3 ########################

[ OK ] Write s3 object
[ OK ] Read s3 object
[ OK ] Delete s3 object
[ OK ] Write s3 object
[ OK ] Upload s3 multipart object
[ OK ] Read s3 multipart object
[ OK ] Delete s3 object
############ String of S3url ################

Your s3url is:

s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=xxx&s3-secret-key=yyy&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA


############ Usage of S3url ################
To start AutoMQ, generate the start commandline using s3url.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

TIPS: Please replace the controller-list and broker-list with your actual IP addresses.

Step 2: Generate a List of Startup Commands

In the command generated in the previous step, replace --controller-list and --broker-list with your host details, specifically with the IP addresses of the 3 CONTROLLERS and 2 BROKERS mentioned in the environment preparation, using the default ports 9092 and 9093.


bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

Parameter Description

Parameter Name
Required
Description
--s3-url
Yes
Generated by the command line tool bin/automq-kafka-admin.sh generate-s3-url, includes authentication, cluster ID, and other information
--controller-list
Yes
Requires at least one address, serving as the IP and port list for the CONTROLLER host. Format: IP1:PORT1; IP2:PORT2; IP3:PORT3
--broker-list
Yes
Requires at least one address, serving as the IP and port list for the BROKER host. Format: IP1:PORT1; IP2:PORT2; IP3:PORT3
--controller-only-mode
No
Determines whether the CONTROLLER node only assumes the role of CONTROLLER. Defaults to false, meaning the deployed CONTROLLER node also acts as a BROKER.

Output Result

After executing the command, it generates the command needed to launch AutoMQ.


############ Start Commandline ##############
To start an AutoMQ Kafka server, please navigate to the directory where your AutoMQ tgz file is located and run the following command.

Before running the command, make sure that Java 17 is installed on your host. You can verify the Java version by executing 'java -version'.

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=1 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.2:9092,CONTROLLER://192.168.0.2:9093 --override advertised.listeners=PLAINTEXT://192.168.0.2:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=2 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.3:9092,CONTROLLER://192.168.0.3:9093 --override advertised.listeners=PLAINTEXT://192.168.0.3:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=3 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.4:9092 --override advertised.listeners=PLAINTEXT://192.168.0.4:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=4 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.5:9092 --override advertised.listeners=PLAINTEXT://192.168.0.5:9092


TIPS: Start controllers first and then the brokers.

The node.id is automatically generated starting from 0.

Step 3: Startup AutoMQ

To start the cluster, execute the command list from the previous step on the designated CONTROLLER or BROKER host. For example, to launch the first CONTROLLER process on 192.168.0.1, use the first command template from the generated startup command list.


bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

Parameter Explanation

When using the startup command, parameters not specified will adopt Apache Kafka's default configuration. For parameters newly added by AutoMQ, AutoMQ's default values will be applied. To override the default configurations, you can append additional --override key=value parameters at the end of the command.

Parameter Name
Required
Description
s3-url
Yes
Generated by the bin/automq-kafka-admin.sh generate-s3-url command line tool, includes authentication, cluster ID, etc.
process.roles
Yes
Options are CONTROLLER or BROKER. If a host serves both as CONTROLLER and BROKER, the configuration value is CONTROLLER, BROKER.
node.idYesAn integer used to uniquely identify a BROKER or CONTROLLER within a Kafka cluster; it must remain unique within the cluster.
controller.quorum.votersYesInformation of hosts participating in the KRAFT election, including nodeid, ip, and port, for example: 0@192.168.0.1:9093, 1@192.168.0.2:9093, 2@192.168.0.3:9093
listenersYesIP and port that are being listened to
advertised.listenersYesThe access address provided by the BROKER for Clients.
log.dirsNoDirectory where KRAFT and BROKER metadata are stored.
s3.wal.pathNoIn production environments, it is recommended to store AutoMQ WAL data on a newly mounted bare device to achieve better performance, as AutoMQ supports writing data directly to bare devices, reducing latency. Ensure the path is correctly configured to store WAL data.
autobalancer.controller.enableNoThe default value is false, not enabling traffic self-balancing. Once traffic self-balancing is enabled, the auto balancer component of AutoMQ will automatically reassign partitions to ensure that overall traffic is balanced.

Tips:

  • For ongoing traffic self-balancing or to handle changes in cluster nodes, consider setting the parameter --override autobalancer.controller.enable=true when launching the Controller.

  • When deploying AutoMQ in a Private Cloud for production environments, it's crucial to ensure the reliability of local SSDs. CubeFS does not support protocols for high-availability block storage, and cannot directly manage disk redundancy or backup. Nevertheless, this can be addressed by implementing a RAID [8] solution.

Run in Background

To enable background mode, append the following code to the end of your command:


command > /dev/null 2>&1 &

With this, you have successfully deployed your AutoMQ cluster using CubeFS, which delivers a cost-effective, low-latency, and elastic Kafka cluster with sub-second scalability. For more insights into AutoMQ’s sub-second reassignment and self-balancing capabilities, refer to the official example.

References

[1] CubeFS: https://www.cubefs.io/

[2] CubeFS Multi-level Cache: https://www.cubefs.io/docs/master/overview/introduction.html

[3] Dependency Configuration: CubeFS | A Cloud Native Distributed Storage System

[4] CubeFS Single Node Deployment: www.cubefs.io

[5] Object Gateway: https://www.cubefs.io/docs/master/design/objectnode.html

[6] CubeFS User Management Documentation: CubeFS | A Cloud Native Distributed Storage System

[7] CubeFS AWS SDK: https://www.cubefs.io/docs/master/user-guide/objectnode.html\#%E6%94%AF%E6%8C%81%E7%9A%84sdk

[8] RAID: https://www.cnblogs.com/chuncn/p/6008173.html