Flashcat
Introduction
In modern enterprises, with the growing demand for data processing, AutoMQ as gradually become a key component for real-time data processing due to its efficiency and cost-effectiveness as a stream processing system. However, as the cluster size expands and business complexity increases, ensuring the stability, high availability, and performance optimization of the AutoMQ cluster becomes crucial. Therefore, integrating a robust and comprehensive monitoring system is essential for maintaining the healthy operation of the AutoMQ cluster. The Nightingale Monitoring System with its efficient data collection, flexible alert management, and rich visualization capabilities, is an ideal choice for enterprises to monitor AutoMQ clusters. By using the Nightingale Monitoring System, enterprises can grasp the operational status of the AutoMQ cluster in real-time, promptly identify and resolve potential issues, optimize system performance, and ensure business continuity and stability.
Overview of AutoMQ
AutoMQ is a cloud-based redesigned stream processing system that maintains 100% compatibility with Apache Kafka while significantly enhancing cost efficiency and elasticity by decoupling storage to object storage. Specifically, AutoMQ offloads storage to shared cloud storage such as EBS and S3 provided by cloud vendors by building the S3Stream stream storage repository on S3, offering low-cost, low-latency, high-availability, high-reliability, and virtually unlimited capacity stream storage capabilities. Compared with the traditional Shared Nothing architecture, AutoMQ adopts a Shared Storage architecture, significantly reducing storage and operational complexity while enhancing system elasticity and reliability.
The design philosophy and technical advantages of AutoMQ make it an ideal choice for replacing existing Kafka clusters in enterprises. By adopting AutoMQ, enterprises can significantly reduce storage costs, simplify operations, and achieve automatic cluster scaling and traffic self-balancing, thereby more efficiently responding to changes in business demands. Additionally, AutoMQ’s architecture supports efficient cold read operations and zero service interruption, ensuring stable system operation under high load and sudden traffic conditions. Its storage structure is as follows:
Overview of Nightingale
The Nightingale Monitoring System (Nightingale) is an open-source, cloud-native observability and analysis tool that adopts an All-in-One design philosophy, integrating data collection, visualization, monitoring alerts, and data analysis into one platform. Its main advantages include efficient data collection capabilities, flexible alert strategies, and rich visualization features. Nightingale is closely integrated with various cloud-native ecosystems, supporting multiple data sources and storage backends, and providing low-latency, high-reliability monitoring services. By using Nightingale, enterprises can achieve comprehensive monitoring and management of complex distributed systems, quickly identify and resolve issues, thereby optimizing system performance and enhancing business continuity.
Prerequisites
To monitor the status of the cluster, you will need the following setup:
Deploy an available AutoMQ node/cluster and open the Metrics collection port
Deploy Nightingale monitoring and its dependencies
Deploy Prometheus to collect Metrics data
Deploy AutoMQ, Prometheus, and Nightingale Monitoring
Deploy AutoMQ
Refer to the AutoMQ documentation: Cluster Deployment | AutoMQ. Before starting the deployment, add the following configuration parameters to enable the Prometheus pull interface. After starting the AutoMQ cluster with the following parameters, each node will additionally open an HTTP interface for us to pull AutoMQ monitoring metrics. These metrics follow the Prometheus Metrics format.
bin/kafka-server-start.sh ...\
--override s3.telemetry.metrics.exporter.type=prometheus \
--override s3.metrics.exporter.prom.host=0.0.0.0 \
--override s3.metrics.exporter.prom.port=8890 \
....
Once AutoMQ monitoring metrics are enabled, you can pull Prometheus-format monitoring metrics via HTTP protocol from any node. The address is: http://{node_ip}:8890. An example response is as follows:
....
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="DescribeDelegationToken"} 0.0 1720520709290
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="CreatePartitions"} 0.0 1720520709290
...
Regarding the introduction to metrics, you can refer to the AutoMQ official documentation: Metrics | AutoMQ [6].
Deploying Prometheus
Prometheus can be deployed by downloading the binary package or using Docker. Below is an introduction to these two deployment methods.
Binary Deployment
For ease of use, you can create a new script and modify the Prometheus download version as needed. Then, execute the script to complete the deployment. First, create a new script:
cd /home
vim install_prometheus.sh
# !!! Paste the Following Script Content and Save and Exit
# Grant Permissions
chmod +x install_prometheus.sh
# Execute the Script
./install_prometheus.sh
version=2.45.3
filename=prometheus-${version}.linux-amd64
mkdir -p /opt/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v${version}/${filename}.tar.gz
tar xf ${filename}.tar.gz
cp -far ${filename}/* /opt/prometheus/
# Config as a Service
cat <<EOF >/etc/systemd/system/prometheus.service
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --web.enable-lifecycle --web.enable-remote-write-receiver
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF
systemctl enable prometheus
systemctl restart prometheus
systemctl status prometheus
Subsequently, modify the Prometheus configuration file to add a task to collect observability data from AutoMQ and restart Prometheus by executing the following command:
# Add the Following Content to the Configuration File:
vim /opt/prometheus/prometheus.yml
# Restart Prometheus
systemctl restart prometheus
Refer to the following configuration file content. Please change client_ip
to the address exposing observability data from AutoMQ:
# My Global Config
global:
scrape_interval: 15s # Set the Scrape Interval to Every 15 Seconds. Default Is Every 1 Minute.
evaluation_interval: 15s # Evaluate Rules Every 15 Seconds. the Default Is Every 1 Minute.
scrape_configs:
# The Job Name Is Added as a Label `job=<job_name>` to Any Timeseries Scraped from This Config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "automq"
static_configs:
- targets: ["{client_ip}:8890"]
After deployment, you can access Prometheus through a browser to verify if it has indeed collected the metrics data from AutoMQ by visiting http://{client_ip}:9090/targets:
Docker Deployment
If you already have a running Prometheus Docker container, please execute the following command to remove the container:
docker stop prometheus
docker rm prometheus
Create a new configuration file and mount it during Docker startup:
mkdir -p /opt/prometheus
vim /opt/prometheus/prometheus.yml
# Refer to the Configuration Mentioned in the "Binary Deployment" Section Above for the Content
Start the Docker container:
docker run -d \
--name=prometheus \
-p 9090:9090 \
-v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
-m 500m \
prom/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--enable-feature=otlp-write-receiver \
--web.enable-remote-write-receiver
This gives you a Prometheus service that collects AutoMQ Metrics. For more information on integrating AutoMQ Metrics with Prometheus, refer to: Integrating Metrics into Prometheus | AutoMQ [7].
Deploy Nightingale Monitoring
Nightingale Monitoring can be deployed in the following three ways. For more detailed deployment instructions, refer to the official documentation [8]:
Deploy using Docker compose
Deploy using binary
Helm Deployment
Next, I will proceed with the deployment using a binary method.
Download Nightingale
Please select the appropriate version for download from the Nightingale Github releases page. Here, we will use version v7.0.0-beta.14. If you are using an AMD architecture machine, you can execute the following command directly:
cd /home
# Download
wget https://github.com/ccfos/nightingale/releases/download/v7.0.0-beta.14/n9e-v7.0.0-beta.14-linux-amd64.tar.gz
mkdir -p /home/flashcat
# Extract the Files to the /home/flashcat Folder
tar -xzf /home/n9e-v7.0.0-beta.14-linux-amd64.tar.gz -C /home/flashcat
# Navigate to the Home Directory
cd /home/flashcat
Configure the Dependency Environment
Nightingale depends on MySQL and Redis, so you need to install these environments beforehand. You can deploy them via Docker or by executing the commands as follows:
# Install Mysql
yum -y install mariadb*
systemctl enable mariadb
systemctl restart mariadb
mysql -e "SET PASSWORD FOR 'root'@'localhost' = PASSWORD('1234');"
# Install Redis
yum install -y redis
systemctl enable redis
systemctl restart redis
Here, Redis is set with no password. Additionally, the MySQL database password is set to 1234. If you need to change this to a different password, you must configure it in the Nightingale configuration file to ensure Nightingale can connect to your database. Modify the Nightingale configuration file:
vim /home/flashcat/etc/config.toml
Modify the username and password under [DB]:
[DB]
# Postgres: Host=%s Port=%s User=%s Dbname=%s Password=%s Sslmode=%s
# Postgres: DSN="host=127.0.0.1 Port=5432 User=root Dbname=n9e_v6 Password=1234 Sslmode=disable"
# Sqlite: DSN="/path/to/filename.db"
DSN = "{username}:{password}@tcp(127.0.0.1:3306)/n9e_v6?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"
# Enable Debug Mode or Not
Import Database Tables
Execute the following command:
mysql -uroot -p1234 < n9e.sql
Use a database tool to check if the database tables were successfully imported:
> show databases;
+--------------------+
| Database |
+--------------------+
| n9e_v6 |
+--------------------+
> show tables;
+-----------------------+
| Tables_in_n9e_v6 |
+-----------------------+
| alert_aggr_view |
| alert_cur_event |
| alert_his_event |
| alert_mute |
| alert_rule |
| alert_subscribe |
| alerting_engines |
| board |
| board_busigroup |
| board_payload |
| builtin_cate |
| builtin_components |
| builtin_metrics |
······
Modify the Nightingale Configuration File
You need to modify the Nightingale configuration file to set up the Prometheus data source:
vim /home/flashcat/etc/config.toml
# Modify the [[Pushgw.Writers]] Section to
[[Pushgw.Writers]]
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
Url = "http://{client_ip}:9090/api/v1/write"
Start Nightingale
In the root directory of Nightingale, /home/flashcat, execute: ./n9e. Once successfully started, you can access it in your browser at http://{client_ip}:17000. The default login credentials are:
Username: root
Password: root.2020
Integrate Prometheus Data Source
Left sidebar Integration -> Data Source -> Prometheus.
At this point, our Nightingale monitoring setup is complete.
Nightingale Monitoring AutoMQ Cluster Status
Next, I will introduce some of the features provided by Nightingale Monitoring to help you better understand the available functionalities when integrating Nightingale with AutoMQ.
Instant Query
Select built-in AutoMQ metrics:
You can try querying some data, such as the average Fetch request processing time kafka_request_time_50p_milliseconds
:
Additionally, you can customize some metrics and use expressions to aggregate these metrics:
Alerting Feature
Select the left sidebar Alert -> Alert Rules -> New Rule. For example, we can set an alert for kafka_network_io_bytes_total
. This metric indicates the total number of bytes sent or received through the network by a Kafka Broker node. By setting an expression for this metric, you can calculate the inbound network I/O rate for Kafka Broker nodes. The expression is:
sum by(job, instance) (rate(kafka_network_io_bytes_total{direction="in"}[1m]))
Setting Alert Rules:
Data preview:
You can also configure groups to be notified when an alert occurs:
After creating the alert, let's simulate a high-concurrency message processing scenario: a total of 2,500,000 messages will be sent to the AutoMQ nodes in a short period. I use the Kafka SDK to send messages, with 50 topics, each receiving 500 messages, repeated 100 times. Here is an example:
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.AdminClientConfig;
import org.apache.kafka.clients.admin.NewTopic;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
public class KafkaTest {
private static final String BOOTSTRAP_SERVERS = "http://{}:9092"; // your automq broker ip
private static final int NUM_TOPICS = 50;
private static final int NUM_MESSAGES = 500;
public static void main(String[] args) throws Exception {
KafkaTest test = new KafkaTest();
// test.createTopics(); // create 50 topics
for(int i = 0; i < 100; i++){
test.sendMessages(); // 25,000 messages will be sent each time, and 500 messages will be sent to each of 50 topics.
}
}
public void createTopics() {
Properties props = new Properties();
props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
try (AdminClient adminClient = AdminClient.create(props)) {
List<NewTopic> topics = new ArrayList<>();
for (int i = 1; i <= NUM_TOPICS; i++) {
topics.add(new NewTopic("Topic-" + i, 1, (short) 1));
}
adminClient.createTopics(topics).all().get();
System.out.println("Topics created successfully");
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
public void sendMessages() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
for (int i = 1; i <= NUM_TOPICS; i++) {
String topic = "Topic-" + i;
for (int j = 1; j <= NUM_MESSAGES; j++) {
String key = "key-" + j;
String value = "{\"userId\": " + j + ", \"action\": \"visit\", \"timestamp\": " + System.currentTimeMillis() + "}";
ProducerRecord<String, String> record = new ProducerRecord<>(topic, key, value);
producer.send(record, (RecordMetadata metadata, Exception exception) -> {
if (exception == null) {
System.out.printf("Sent message to topic %s partition %d with offset %d%n", metadata.topic(), metadata.partition(), metadata.offset());
} else {
exception.printStackTrace();
}
});
}
}
System.out.println("Messages sent successfully");
}
}
}
We can then see the alert information in the Nightingale console:
Alert Details:
Dashboard
First, we can use known metrics to build our own dashboard. Below is a dashboard showing statistics on AutoMQ message processing time, total messages, and network IO bits:
Additionally, we can utilize the official built-in dashboards for monitoring. Left sidebar -> Aggregation -> Template Center:
Choosing AutoMQ, you will see several DashBoards available:
We select the Topic Metrics dashboard, displaying the following content:
This dashboard shows the message input and output utilization, message input and request rates, and message sizes of the AutoMQ cluster over a recent period. These metrics are used to monitor and optimize the performance and stability of the AutoMQ cluster: By assessing the message input and output utilization, you can evaluate the load on producers and consumers, ensuring the cluster can handle the message flow properly. The message input rate is used for real-time monitoring of the rate at which producers send messages, identifying potential bottlenecks or sudden spikes in traffic. The request rate helps understand the frequency of client requests, optimizing resource allocation and processing capabilities. The message size metric analyzes the average size of messages to adjust configurations for optimizing storage and network transmission efficiency. Monitoring these metrics allows you to detect and resolve performance issues promptly, ensuring the efficient and stable operation of the AutoMQ cluster.
This concludes our integration process. For more usage methods, you can refer to Nightingale's official documentation [10] for further experience.
Summary
Through this article, we have detailed how to comprehensively monitor an AutoMQ cluster using the Nightingale monitoring system. We started with the basic concepts of AutoMQ and Nightingale and gradually explained how to deploy AutoMQ, Prometheus, and Nightingale, and configure monitoring and alerting rules. With this integration, enterprises can monitor the running status of the AutoMQ cluster in real-time, promptly discover and resolve potential issues, optimize system performance, and ensure business continuity and stability. Nightingale's powerful data collection capabilities, flexible alerting mechanisms, and rich visualization features make it an ideal choice for monitoring complex distributed systems. We hope this article provides valuable references for your practical applications, helping your system operations become more efficient and stable.
References
[1] AutoMQ:https://www.automq.com/zh
[2] Nightingale Monitoring: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/
[3] Nightingale Architecture: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/
[4] Prometheus:https://prometheus.io/docs/prometheus/latest/getting_started/
[5] Cluster Deployment | AutoMQ: https://docs.automq.com/automq/getting-started/cluster-deployment-on-linux
[6] Metrics | AutoMQ:https://docs.automq.com/automq/observability/metrics
[7] Integrating Metrics into Prometheus: https://docs.automq.com/automq/observability/integrating-metrics-with-prometheus
[8] Deployment Instructions: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/install/intro/
[9] Nightingale GitHub Releases: https://github.com/ccfos/nightingale
[10] Nightingale Official Documentation: https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/overview/