Metrics
This article will introduce the observability metrics definitions for AutoMQ, helping you better understand AutoMQ's performance and operational status.
AutoMQ metrics are defined and presented in Prometheus format. If you need other protocol formats, you will need to convert them yourself.
General Metrics
Kafka_server_connection_count
The current number of established connections on a node.
- Type: Gauge
Kafka_network_threads_idle_rate
Idle rate of Kafka SocketServer network threads, range: [0, 1.0].
- Type: Gauge
Kafka_io_threads_idle_time_nanoseconds_total
Idle time of Kafka request handler threads, which is a cumulative value of Apache Kafka's native metric RequestHandlerAvgIdlePercent
, measured in nanoseconds. By differentiating this value over time (in nanoseconds), you can obtain the thread idle rate. Note that if the node is a combined node (i.e., serves as both Controller and Broker), the request handler idle rate for both Controller and Broker are summed, making the maximum thread idle rate 2.0.
- Type: Counter
Controller Metrics
Kafka_controller_active_count
Indicates whether the current Controller node is the active Controller, with a value of 1 indicating active and 0 indicating inactive.
- Type: Gauge
Kafka_broker_active_count
The number of active Brokers in the current cluster.
- Type: Gauge
Kafka_broker_fenced_count
The number of Brokers fenced in the current cluster.
- Type: Gauge
Kafka_topic_count
The total number of Topics in the current cluster.
- Type: Gauge
Kafka_partition_total_count
The total number of partitions in the current cluster.
- Type: Gauge
Kafka_partition_offline_count
The total number of partitions without a leader in the current cluster.
- Type: Gauge
Kafka_stream_auto_balancer_metrics_time_delay_milliseconds
The latency of AutoBalancer monitoring metrics reported by each Broker node in the cluster. If the latency exceeds a certain threshold, the Broker node is considered out-of-sync by AutoBalancer and will no longer participate in partition reassignment.
Type: Gauge
Labels:
- node_id: The node ID reporting AutoBalancer monitoring metrics.
Kafka_stream_s3_object_count
The total number of Objects uploaded to Object storage in the current cluster, categorized by Object status.
Type: Gauge
Labels:
state: Object states are categorized into three types:
prepared: Objects that have not yet completed writing and have not been committed
committed: Objects that have completed writing and have been committed
mark_destroyed: Objects marked for deletion, which will be removed from the object storage after a certain delay
Kafka_stream_s3_object_size_bytes
The total size of objects uploaded to object storage by the current cluster
- Type: Gauge
Kafka_stream_stream_object_num
The number of StreamObjects uploaded to object storage by the current cluster
- Type: Gauge
Kafka_stream_stream_set_object_num
The number of StreamSetObjects uploaded to object storage by each Broker in the current cluster
Type: Gauge
Labels:
- node_id: The corresponding Broker node ID
Broker Metrics
Kafka_message_count_total
The total number of messages received by the Broker node; measuring this over time provides the message throughput.
Type: Counter
Labels:
- topic
Kafka_network_io_bytes_total
The total size of messages received and sent by the Broker node; measuring this over time provides the message size throughput.
Type: Counter
Labels:
topic
partition
direction:
in: indicates receiving messages
out: indicates sending messages
Kafka_topic_request_count_total
The total number of requests received for each Topic on the Broker node, including only produce and fetch request types.
Type: Counter
Labels:
topic
type: request type
produce
fetch
Kafka_topic_request_failed_total
The total number of failed requests for each Topic on the Broker node, including only produce and fetch request types.
Type: Counter
Labels:
topic
type: Request Type
produce
fetch
Kafka_request_count_total
The total number of requests received by the Broker nodes.
Type: Counter
Labels:
type: Request Type
version: The API version of the request type.
Kafka_request_error_count_total
The total number of failed requests at the Broker nodes. Note that even successful requests are counted in this metric, with the error code for successful requests being NONE.
Type: Counter
Labels:
type: Request Type
error: Error code, NONE indicates that the request was successful.
Kafka_request_size_bytes_total
The total size of requests received by the Broker nodes.
Type: Counter
Labels:
- type: Request Type
Kafka_request_size_50p(99p/mean/max)_bytes
The size of requests received by Broker nodes, represented by different percentiles.
Type: Gauge
Labels:
- type: Request Type
Kafka_request_time_milliseconds_total
The total time consumed by Broker nodes to process requests.
Type: Counter
Labels:
- type: Request Type
Kafka_request_time_50p(99p/mean/max)_milliseconds
The processing time of requests by Broker nodes, represented by different percentiles.
Type: Gauge
Labels:
- type: Request Type
Kafka_request_queue_time_milliseconds_total
The total queuing time of requests at Broker nodes, which increases when Kafka IO threads are busy.
Type: Counter
Labels:
- type: Request Type
Kafka_request_queue_time_50p(99p/mean/max)_milliseconds
Queue time of requests at Broker nodes, presented by different percentiles.
Type: Gauge
Labels:
- type: Request Type
Kafka_response_queue_time_milliseconds_total
Response queue time at Broker nodes. When Kafka Network threads are busy, response queue time increases.
Type: Counter
Labels:
- type: Request Type
Kafka_response_queue_time_50p(99p/mean/max)_milliseconds
Response queue time at Broker nodes, presented by different percentiles.
Type: Gauge
Labels:
- type: Request Type
Kafka_request_queue_size
Request queue size at Broker nodes.
- Type: Gauge
Kafka_response_queue_size
The response queue size of the Broker node.
- Type: Gauge
Kafka_purgatory_size
The number of requests waiting in the producer or fetch purgatory on the Broker node.
Type: Gauge
Labels:
type:
Produce
Fetch
Kafka_partition_count
The number of partitions currently assigned to the Broker node.
- Type: Gauge
Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds
The log flush time on the Broker node; in AutoMQ, this represents the flush time of the Delta WAL, shown in different percentiles.
- Type: Gauge
Kafka_log_end_offset
The maximum logical offset of each partition on the Broker node.
Type: Gauge
Labels:
topic
partition
Kafka_log_size
The message size of each partition on the Broker node.
Type: Gauge
Labels:
topic
partition
Kafka_group_commit_offset
The consumption offsets for each Consumer Group on the corresponding partitions. Note that this metric is reported by the Broker where the Group Coordinator of each Consumer Group resides.
Type: Gauge
Labels:
consumer_group
topic
partition
Kafka_group_count
The number of Consumer Groups managed by the Broker node where each Group Coordinator is located.
- Type: Gauge
Kafka_group_preparing_rebalance_count
The number of Consumer Groups preparing for a rebalance.
- Type: Gauge
Kafka_group_completing_rebalance_count
The number of Consumer Groups waiting for the Leader to assign state.
- Type: Gauge
Kafka_group_stable_count
The number of Consumer Groups in a stable state.
- Type: Gauge
Kafka_group_empty_count
The number of Consumer Groups with no members but not yet expired.
- Type: Gauge
Kafka_group_dead_count
The number of Consumer Groups with no members and whose metadata has been removed.
- Type: Gauge
Kafka_stream_upload_size_bytes_total
The total size of data uploaded by Broker nodes to object storage.
- Type: Counter
Kafka_stream_download_size_bytes_total
The total size of data downloaded by Broker nodes from object storage.
- Type: Counter
Kafka_stream_network_inbound_usage_bytes_total
The total ingress bandwidth usage of Broker nodes, including received messages and data downloaded from object storage, with the time derivative yielding ingress throughput.
- Type: Counter
Kafka_stream_network_outbound_usage_bytes_total
The total outbound bandwidth usage of a Broker node, including message consumption and data upload to object storage, can be differentiated over time to obtain traffic throughput.
- Type: Counter
Kafka_stream_network_inbound_available_bandwidth_bytes
The inbound traffic throughput reserved for cold reads and compaction on a Broker node. If this value is less than the required inbound traffic for cold reads and compaction, the corresponding requests will be placed into a throttling queue, and normal message sending and receiving traffic will not be affected by this throttling. Note that this metric value only represents the instantaneous value at the time of sampling. Due to the sampling interval and the specific implementation of the throttling strategy, this metric value is for reference only.
- Type: Gauge
Kafka_stream_network_outbound_available_bandwidth_bytes
The outbound traffic throughput reserved for cold reads and compaction on a Broker node. If this value is less than the required outbound traffic for cold reads and compaction, the corresponding requests will be placed into a throttling queue, and normal message sending and receiving traffic will not be affected by this throttling. Note that this metric value only represents the instantaneous value at the time of sampling. Due to the sampling interval and the specific implementation of the throttling strategy, this metric value is for reference only.
- Type: Gauge
Kafka_stream_network_inbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queuing time in the throttling queue for inbound traffic requests of cold reads and compaction when they get executed.
- Type: Gauge
Kafka_stream_network_outbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queuing time in the throttling queue for outbound traffic requests of cold reads and compaction when they get executed.
- Type: Gauge
Kafka_stream_operation_latency_50p(99p/mean/max)_nanoseconds
The operation duration at each stage of the AutoMQ S3Stream module.
Type: Gauge
Labels:
operation_type
operation_name