Skip to Main Content

Guance Cloud

Preface

Guance Cloud

Guance Cloud [1] is a unified real-time monitoring application designed specifically for cloud platforms, cloud-native environments, applications, and business-related needs. It integrates three major signals: metrics, logs, and traces, covering testing, prerelease, and production environments to achieve observability throughout the software development lifecycle. With Guance Cloud, enterprises can build comprehensive end-to-end application observability, enhancing the overall transparency and controllability of the IT architecture.

As a powerful data analytics platform, Guance Cloud includes several core modules, such as DataKit [2], a unified data collector, and the DataFlux Func data processing development platform.

AutoMQ

AutoMQ [3] is a new generation of Apache Kafka® distribution redesigned based on cloud-native concepts. It provides up to 10 times cost and elasticity advantages while being 100% compatible with the Apache Kafka protocol. AutoMQ stores data entirely on S3, allowing for rapid response to sudden traffic spikes without the need for data replication during cluster expansion. In contrast, Apache Kafka requires significant bandwidth for partition data replication after scaling, making it difficult to handle sudden traffic. AutoMQ achieves high system autonomy through features like automatic scaling, self-balancing, and automatic fault recovery, achieving higher availability without human intervention. AutoMQ's shared storage architecture:

Observability Interfaces of AutoMQ

Due to AutoMQ's full compatibility with Kafka and support for open Prometheus-based metrics collection endpoints, it can be integrated with the Guance Cloud platform using Guance Cloud's DataKit data collection tool. This integration allows users to monitor and manage the status of AutoMQ clusters conveniently. The Guance Cloud platform supports user-defined aggregation and querying of metrics data. Using provided dashboard templates or custom dashboards, we can effectively compile various statistics for AutoMQ clusters, such as common Topics, Brokers, Partitions, and Group statistics.

Additionally, based on the metrics observability data, we can also query errors encountered during the operation of AutoMQ clusters and various current system usage metrics, such as JVM CPU usage, JVM heap usage, and cache size. These metrics help us quickly identify and resolve issues in case of cluster anomalies, which is highly beneficial for system high availability and rapid recovery. Next, I will introduce how to use the Guance Cloud platform to monitor the status of AutoMQ clusters.

Steps to Integrate with Observation Cloud

Enable Metric Fetching Interface in AutoMQ

Refer to the AutoMQ documentation: Cluster Deployment | AutoMQ [4]. Before starting the deployment, add the following configuration parameters to enable the Prometheus fetching interface. After starting the AutoMQ cluster with these parameters, each node will expose an additional HTTP interface for fetching AutoMQ monitoring metrics. The metrics format complies with Prometheus Metrics.


bin/kafka-server-start.sh ...\
--override s3.telemetry.metrics.exporter.type=prometheus \
--override s3.metrics.exporter.prom.host=0.0.0.0 \
--override s3.metrics.exporter.prom.port=8890 \
....

Once AutoMQ monitoring metrics are enabled, you can fetch Prometheus-formatted monitoring metrics from any node via HTTP at the address: http://{node_ip}:8890. A sample response is as follows:


....
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="DescribeDelegationToken"} 0.0 1720520709290
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="CreatePartitions"} 0.0 1720520709290
...

For an introduction to the metrics, refer to the AutoMQ official documentation: Metrics | AutoMQ [5].

Install and Configure DataKit Collection Tool

DataKit is an open-source monitoring collection tool provided by Observation Cloud that supports Prometheus Metrics fetching. You can use DataKit to fetch monitoring data from AutoMQ and aggregate it into the Observation Cloud platform.

Install the DataKit Tool

Tips: For more details on DataKit installation, refer to the documentation: Host Installation - Guance Documentation [6].

First, register for a Guance account and log in. Then, in the main interface, click "Integration" on the left side and select "DataKit" at the top to see the DataKit installation command.


DK_DATAWAY="https://openway.guance.com?token=<TOKEN>" bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Copy the above command and run it on all nodes in the cluster to complete the DataKit installation.

Tips: DataKit needs to be installed on all Brokers in the cluster that need to be monitored.

After executing the installation command successfully, use the command datakit monitor to verify whether DataKit has been installed successfully.

AutoMQ Collector Configuration and Activation

Here, we need to configure the DataKit AutoMQ collector on the server of each node where data needs to be collected. Navigate to the directory /usr/local/datakit/conf.d/prom and create a collector configuration file named prom.conf. The collector configuration will include open observability data interfaces, collector names, Prometheus instance names, and important collection intervals. You can adjust the configuration on each server as needed:


[[inputs.prom]]

urls = ["http://clientIP:8890/metrics"] # ClientIP Should Be Your Own Server Address.
source = "AutoMQ"

## Keep Exist Metric Name
## If the Keep_exist_metric_name Is True, Keep the Raw Value for Field Names.
keep_exist_metric_name = true

[inputs.prom.tags_rename]
overwrite_exist_tags = true

[inputs.prom.tags_rename.mapping]
service_name = "job"
service_instance_id = "instance"

[inputs.prom.tags]
component="AutoMQ"
interval = "10s"


Parameter Adjustment Instructions:

urls
AutoMQ metrics address, provide the corresponding metrics URL exposed by the component here
source
Collector alias, it is recommended to distinguish it
interval
Collection interval, specifies the interval at which the collector collects data

Execute datakit service -R to restart the DataKit node for the collector configuration to take effect.

Monitor AutoMQ Clusters Via GUI-Based Management

The monitoring platform has integrated AutoMQ and provides multiple default dashboards. You can view the Dashboard Examples[7]. Below are examples of several commonly used templates, and we will introduce their functions:

Cluster Monitoring

The primary display shows the number of active Brokers, as well as the total number of Topics, number of Partitions, and more. Additionally, you can specify which node to query by selecting it in the Cluster_id.

By monitoring the status of the Kafka cluster, we can promptly identify and resolve potential issues such as node failures, insufficient disk space, and network latency, ensuring the system remains more controllable and stable.

Broker Monitoring

The AutoMQ Broker dashboard on Volcanic Cloud describes various metrics for all Brokers, such as the number of connections, number of partitions, the number of messages received per second (ops), and the amount of input and output data per second measured in Bytes.

Topic Monitoring

This section provides an overview of all the Topics contained within all nodes. As previously mentioned, you can also specify a node to query the Topic information. These metrics primarily include the space occupied by each Topic, the number of messages received, and Request Throughput, which indicates the ability to process requests per unit of time.

Through Volcanic Cloud, we successfully monitored the status of the AutoMQ cluster, with dashboard data aggregated or queried based on Metrics.

Summary

This article introduces how to seamlessly integrate the Observability Cloud Platform with AutoMQ to monitor the status of AutoMQ clusters. There are also advanced operations available, such as custom alert functions and custom data queries, which can be performed via the official guidelines. You can manually experiment with these operations to find what suits your needs. We hope this article will be helpful as you combine the Observability Cloud Platform with AutoMQ!

References

[1] Observability Cloud: https://docs.guance.com/getting-started/product-introduction/

[2] DataKit: https://docs.guance.com/datakit/

[3] AutoMQ: https://www.automq.com

[4] Cluster Deployment of AutoMQ: https://docs.automq.com/docs/automq-opensource/IyXrw3lHriVPdQkQLDvcPGQdnNh

[5] Host Installation - Observability Cloud Documentation: https://docs.guance.com/datakit/datakit-install/

[6] Metrics | AutoMQ:https://docs.automq.com/automq/observability/metrics

[7] Dashboard Example: https://console.guance.com/scene/dashboard/createDashboard?w=wksp_63b96920660e4962a07429b65ef163e7&lak=Scene