CubeFS
Preface
CubeFS [1] is a pioneering cloud-native storage solution, currently in its incubation phase under the auspices of the Cloud Native Computing Foundation (CNCF). It supports a variety of access protocols including S3, POSIX, and HDFS, and provides two storage engines: replication and erasure coding. CubeFS offers functionalities like multi-tenancy, multi-AZ deployment, and cross-region replication, making it ideal for use in diverse scenarios such as big data, AI, container platforms, databases, middleware storage-compute separation, data sharing, and data protection.
![](/assets/images/1-86e77d1089cd2b65608f56b43c0ff41f.png)
AutoMQ leverages a cutting-edge shared storage architecture that benefits from cost-effective object storage, and CubeFS's support for an S3-compatible interface is a perfect match. The ObjectNode feature in CubeFS offers an S3-compatible object storage interface to manage files, facilitating the use of tools like S3Browser, S3Cmd, or the native Amazon S3 SDK. Consequently, AutoMQ can be seamlessly integrated with CubeFS to create a streaming system that not only aligns with Kafka’s capabilities but also provides enhanced cost efficiency, ultimate scalability, and sub-millisecond latency.
This article explores the steps to deploy the AutoMQ cluster in a private data center using CubeFS.
Prerequisites
Prepare a CubeFS Cluster
- An existing CubeFS environment. If you have not set up CubeFS yet, please consult the official documentation for dependency configuration [3] and establishing a basic CubeFS cluster [4].
The standard installation package of CubeFS includes a build/bin directory filled with command-line tools essential for cluster management. This article will utilize these tools for further configurations.
Verify the cluster status with the CubeFS command line tool to confirm a successful setup:
# Run the Command
./build/bin/cfs-cli cluster info
# Review the Output
[Cluster]
Cluster name : cfs_dev
Master leader : 172.16.1.101:17010
Master-1 : 172.16.1.101:17010
Master-2 : 172.16.1.102:17010
Master-3 : 172.16.1.103:17010
Auto allocate : Enabled
MetaNode count (active/total) : 4/4
MetaNode used : 0 GB
MetaNode available : 21 GB
MetaNode total : 21 GB
DataNode count (active/total) : 4/4
DataNode used : 44 GB
DataNode available : 191 GB
DataNode total : 235 GB
Volume count : 2
...
Note: The IP and port of the master node in the CubeFS cluster are needed for the upcoming object gateway configuration.
Activate the Object Gateway
To enable CubeFS's support for the object storage protocol, activate the object gateway [5]. The object gateway provides an S3-compatible interface, allowing CubeFS to support both the traditional POSIX file system and the S3-compatible object storage interface. This dual-interface capability offers users a versatile data storage and access solution. Specifically, once the object gateway is enabled, users can manage files in CubeF using the native Amazon S3 SDK, leveraging the benefits of object storage.
Begin by creating an objectnode.json configuration file in the CubeFS root directory. Here's an example of what the objectnode.json configuration file might look like:
{
"role": "objectnode",
"listen": "17410",
"domains": [
"object.cfs.local"
],
"logDir": "/cfs/Logs/objectnode",
"logLevel": "info",
"masterAddr": [
"172.16.1.101:17010",
"172.16.1.102:17010",
"172.16.1.103:17010"
],
"exporterPort": 9503,
"prof": "7013"
}
Note: The masterAddr's IP and port details can be sourced from the CubeFS cluster information mentioned earlier.
Then, use the following command to initiate the object gateway:
nohup ./build/bin/cfs-server -c objectnode.json &
Create a CubeFS User
- Create a CubeFS user and retrieve the AccessKey and Secret AccessKey details.
For creating and querying user information, refer to the User Management Documentation [6].
CubeFS supports a variety of creation methods, including using the AWS SDK [7] or through an HTTP request. Here, we will demonstrate the process using an HTTP request:
- Specify the user ID, password, and type, and then access the creation interface:
curl -H "Content-Type:application/json" -X POST --data '{"id":"automq","pwd":"12345","type":3}' "http://172.16.1.101:17010/user/create"
- Query user information using the user ID:
curl -v "http://10.196.59.198:17010/user/info?user=automq" | python -m json.tool
- Example response:
{
"user_id": "automq",
"access_key": "UZONf5FF6WKwFCj4",
"secret_key": "TRZzfPitQkxOLXqPhKMBRrDYUyXXMpWG",
"policy": {
"own_vols": ["vol1"],
"authorized_vols": {
"ltptest": [
"perm:builtin:ReadOnly",
"perm:custom:PutObjectAction"
]
}
},
"user_type": 3,
"create_time": "2024-06-06 09:25:04"
}
Create a Bucket Using the S3 Interface
Use the AWS CLI tool on CubeFS to create the necessary bucket for AutoMQ cluster deployment.
Obtain the user's key and other details, configure them using aws configure
, and create the bucket using the AWS CLI tool.
aws s3api create-bucket --bucket automq-data --endpoint=http://127.16.1.101:17140
aws s3api create-bucket --bucket automq-ops --endpoint=http://127.16.1.101:17140
Use commands to view existing buckets
aws s3 ls --endpoint=http://172.16.1.101:17140
Prepare the Machines Required for AutoMQ Deployment
Prepare 5 hosts for deploying the AutoMQ cluster, ideally selecting Linux amd64 hosts with 2 cores and 16GB of memory, and equip them with two virtual storage volumes. Example as follows:
Role | IP | Node ID | System Volume | Data Volume |
---|---|---|---|---|
CONTROLLER | 192.168.0.1 | 0 | EBS 20GB | EBS 20GB |
CONTROLLER | 192.168.0.2 | 1 | EBS 20GB | EBS 20GB |
CONTROLLER | 192.168.0.3 | 2 | EBS 20GB | EBS 20GB |
BROKER | 192.168.0.4 | 3 | EBS 20GB | EBS 20GB |
BROKER | 192.168.0.5 | 4 | EBS 20GB | EBS 20GB |
Tips:
Ensure these machines are in the same subnet and can communicate with each other
In non-production settings, you can deploy a single Controller, which defaults to serving both the Controller and Broker roles.
- Download the latest official binary package from AutoMQ Github Releases to install AutoMQ.
Install and Start the AutoMQ Cluster.
Configure S3URL.
Step 1: Generate an S3 URL.
AutoMQ provides the automq-kafka-admin.sh tool, which enables the rapid deployment of AutoMQ. Just supply an S3 URL with the necessary access points and authentication details, and you can launch AutoMQ using a single command without manually generating a cluster ID or formatting storage.
### Command-line Usage Example.
bin/automq-kafka-admin.sh generate-s3-url \
--s3-access-key=xxx \
--s3-secret-key=yyy \
--s3-region=cn-northwest-1 \
--s3-endpoint=s3.cn-northwest-1.amazonaws.com.cn \
--s3-data-bucket=automq-data \
--s3-ops-bucket=automq-ops
If an error occurs, ensure the accuracy of the parameters and their format.
When using CubeFS, the following configuration can be used to generate a specific S3URL.
Parameter Name | Default Value in This Example | Description |
---|---|---|
--s3-access-key | XXX | Remember to replace this after creating a CubeFS user based on actual conditions |
--s3-secret-key | YYY | Remember to replace this after creating a CubeFS user based on actual conditions |
--s3-region | auto | This can be set to the cluster name, or auto |
--s3-endpoint | http://host ip:17140 | This parameter is the S3 access point for CubeFS |
--s3-data-bucket | automq-data | CubeFS bucket name |
--s3-ops-bucket | automq-ops | CubeFS bucket name |
Output Result
After this command is run, the process will automatically move through the following stages:
Detect core features of S3 using the provided accessKey and secret Key to ensure compatibility between AutoMQ and S3.
Create an s3url using identity details and access point information.
Utilize the s3url to retrieve an example of the startup command for AutoMQ. In this command, replace --controller-list and --broker-list with the actual CONTROLLER and BROKER that need to be deployed.
The example of the execution result is as follows:
############ Ping S3 ########################
[ OK ] Write s3 object
[ OK ] Read s3 object
[ OK ] Delete s3 object
[ OK ] Write s3 object
[ OK ] Upload s3 multipart object
[ OK ] Read s3 multipart object
[ OK ] Delete s3 object
############ String of S3url ################
Your s3url is:
s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=xxx&s3-secret-key=yyy&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA
############ Usage of S3url ################
To start AutoMQ, generate the start commandline using s3url.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"
TIPS: Please replace the controller-list and broker-list with your actual IP addresses.
Step 2: Generate a List of Startup Commands
In the command generated in the previous step, replace --controller-list and --broker-list with your host details, specifically with the IP addresses of the 3 CONTROLLERS and 2 BROKERS mentioned in the environment preparation, using the default ports 9092 and 9093.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"
Parameter Description
Parameter Name | Required | Description |
---|---|---|
--s3-url | Yes | Generated by the command line tool bin/automq-kafka-admin.sh generate-s3-url , includes authentication, cluster ID, and other information |
--controller-list | Yes | Requires at least one address, serving as the IP and port list for the CONTROLLER host. Format: IP1:PORT1; IP2:PORT2; IP3:PORT3 |
--broker-list | Yes | Requires at least one address, serving as the IP and port list for the BROKER host. Format: IP1:PORT1; IP2:PORT2; IP3:PORT3 |
--controller-only-mode | No | Determines whether the CONTROLLER node only assumes the role of CONTROLLER. Defaults to false, meaning the deployed CONTROLLER node also acts as a BROKER. |
Output Result
After executing the command, it generates the command needed to launch AutoMQ.
############ Start Commandline ##############
To start an AutoMQ Kafka server, please navigate to the directory where your AutoMQ tgz file is located and run the following command.
Before running the command, make sure that Java 17 is installed on your host. You can verify the Java version by executing 'java -version'.
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=1 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.2:9092,CONTROLLER://192.168.0.2:9093 --override advertised.listeners=PLAINTEXT://192.168.0.2:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=2 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.3:9092,CONTROLLER://192.168.0.3:9093 --override advertised.listeners=PLAINTEXT://192.168.0.3:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=3 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.4:9092 --override advertised.listeners=PLAINTEXT://192.168.0.4:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=4 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.5:9092 --override advertised.listeners=PLAINTEXT://192.168.0.5:9092
TIPS: Start controllers first and then the brokers.
The node.id is automatically generated starting from 0.
Step 3: Startup AutoMQ
To start the cluster, execute the command list from the previous step on the designated CONTROLLER or BROKER host. For example, to launch the first CONTROLLER process on 192.168.0.1, use the first command template from the generated startup command list.
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092
Parameter Explanation
When using the startup command, parameters not specified will adopt Apache Kafka's default configuration. For parameters newly added by AutoMQ, AutoMQ's default values will be applied. To override the default configurations, you can append additional --override key=value parameters at the end of the command.
Parameter Name | Required | Description |
---|---|---|
s3-url | Yes | Generated by the bin/automq-kafka-admin.sh generate-s3-url command line tool, includes authentication, cluster ID, etc. |
process.roles | Yes | Options are CONTROLLER or BROKER. If a host serves both as CONTROLLER and BROKER, the configuration value is CONTROLLER, BROKER. |
node.id | Yes | An integer used to uniquely identify a BROKER or CONTROLLER within a Kafka cluster; it must remain unique within the cluster. |
controller.quorum.voters | Yes | Information of hosts participating in the KRAFT election, including nodeid, ip, and port, for example: 0@192.168.0.1:9093, 1@192.168.0.2:9093, 2@192.168.0.3:9093 |
listeners | Yes | IP and port that are being listened to |
advertised.listeners | Yes | The access address provided by the BROKER for Clients. |
log.dirs | No | Directory where KRAFT and BROKER metadata are stored. |
s3.wal.path | No | In production environments, it is recommended to store AutoMQ WAL data on a newly mounted bare device to achieve better performance, as AutoMQ supports writing data directly to bare devices, reducing latency. Ensure the path is correctly configured to store WAL data. |
autobalancer.controller.enable | No | The default value is false, not enabling traffic self-balancing. Once traffic self-balancing is enabled, the auto balancer component of AutoMQ will automatically reassign partitions to ensure that overall traffic is balanced. |
Tips:
For ongoing traffic self-balancing or to handle changes in cluster nodes, consider setting the parameter --override autobalancer.controller.enable=true when launching the Controller.
When deploying AutoMQ in a Private Cloud for production environments, it's crucial to ensure the reliability of local SSDs. CubeFS does not support protocols for high-availability block storage, and cannot directly manage disk redundancy or backup. Nevertheless, this can be addressed by implementing a RAID [8] solution.
Run in Background
To enable background mode, append the following code to the end of your command:
command > /dev/null 2>&1 &
With this, you have successfully deployed your AutoMQ cluster using CubeFS, which delivers a cost-effective, low-latency, and elastic Kafka cluster with sub-second scalability. For more insights into AutoMQ’s sub-second reassignment and self-balancing capabilities, refer to the official example.
References
[1] CubeFS: https://www.cubefs.io/
[2] CubeFS Multi-level Cache: https://www.cubefs.io/docs/master/overview/introduction.html
[3] Dependency Configuration: CubeFS | A Cloud Native Distributed Storage System
[4] CubeFS Single Node Deployment: www.cubefs.io
[5] Object Gateway: https://www.cubefs.io/docs/master/design/objectnode.html
[6] CubeFS User Management Documentation: CubeFS | A Cloud Native Distributed Storage System
[7] CubeFS AWS SDK: https://www.cubefs.io/docs/master/user-guide/objectnode.html\#%E6%94%AF%E6%8C%81%E7%9A%84sdk