Metrics
This article outlines the observable metric definitions for AutoMQ, providing insights into AutoMQ's performance and operational status.
Metrics for AutoMQ are outlined and presented in the Prometheus format; if you require metrics in different protocol formats, you will need to handle the conversion on your own.
General Metrics
Kafka_server_connection_count
Current number of connections established by the node.
- Type: Gauge
Kafka_network_threads_idle_rate
Idle rate of Kafka SocketServer network threads, range: [0, 1.0].
- Type: Gauge
Kafka_io_threads_idle_time_nanoseconds_total
Kafka request handler thread idle time, representing the cumulative value of the native Apache Kafka metric RequestHandlerAvgIdlePercent, measured in nanoseconds. Tracking this over time (in nanoseconds) allows you to determine the thread idle rate. It's important to note that when the node functions as a combined node (serving both as Controller and Broker), as each Controller and Broker have separate request handlers, this metric reflects the combined idle time for both. The highest possible idle rate derived is 2.0.
- Type: Counter
Controller Metrics
Kafka_controller_active_count
Indicates whether the current Controller node is the active Controller; a metric value of 1 signifies active, while 0 denotes inactive.
- Type: Gauge
Kafka_broker_active_count
Number of active Brokers currently in the cluster.
- Type: Gauge
Kafka_broker_fenced_count
Number of Brokers currently fenced in the cluster.
- Type: Gauge
Kafka_topic_count
Total count of Topics within the cluster.
- Type: Gauge
Kafka_partition_total_count
Total count of partitions across the cluster.
- Type: Gauge
Kafka_partition_offline_count
Total number of partitions without leaders in the Tcluster.
- Type: Gauge
Kafka_stream_auto_balancer_metrics_time_delay_milliseconds
Latency for AutoBalancer monitoring metrics as reported by each Broker node in the cluster; should this latency surpass a specified threshold, the node is deemed out-of-sync by the AutoBalancer and will be excluded from partition reassignment processes.
Type: Gauge
Labels:
- node_id: Identifier for the node that reported the AutoBalancer monitoring metrics
Kafka_stream_s3_object_count
Current total number of Objects uploaded to Object storage in the cluster, organized by the status of these Objects.
Type: Gauge
Labels:
state: Classification of object states into three types:
prepared: Objects that are incomplete and uncommitted
committed: Objects that have been fully written and committed
mark_destroyed: Objects designated for deletion, scheduled to be purged from object storage after a specified delay
Kafka_stream_s3_object_size_bytes
Total size of objects uploaded to object storage by the current cluster.
- Type: Gauge
Kafka_stream_stream_object_num
Number of StreamObjects uploaded to object storage by the current cluster.
- Type: Gauge
Kafka_stream_stream_set_object_num
Number of StreamSetObjects uploaded to object storage by each Broker in the current cluster.
Type: Gauge
Labels:
- node_id: Identifier for the corresponding Broker node.
Broker Metrics
Kafka_message_count_total
The total number of messages received by Broker nodes, and the calculation over time reveals the message count throughput.
Type: Counter
Labels:
- topic
Kafka_network_io_bytes_total
The total volume of messages received and sent by Broker nodes, and the calculation over time reveals the message size throughput.
Type: Counter
Labels:
topic
partition
direction:
in: indicates the reception of messages
out: indicates the dispatch of messages
Kafka_topic_request_count_total
The total number of requests received for each Topic on Broker nodes, specifically including only produce and fetch request types.
Type: Counter
Labels:
topic
type: request type
produce
fetch
Kafka_topic_request_failed_total
The total number of request failures for each Topic on Broker nodes, specifically including only produce and fetch request types.
Type: Counter
Labels:
topic
type: request type
produce
fetch
Kafka_request_count_total
Total number of requests received by the Broker node.
Type: Counter
Labels:
type: request type
version: API Version for this type of request
Kafka_request_error_count_total
Total number of failed requests on the Broker node, noting that this metric counts all requests, including successful ones, where a successful request's error code is NONE.
Type: Counter
Labels:
type: request type
error: error code, NONE indicates a successful request
Kafka_request_size_bytes_total
Total size of requests received by the Broker node.
Type: Counter
Labels:
- type: Request type
Kafka_request_size_50p(99p/mean/max)_bytes
Size of requests received by Broker nodes, represented by different percentiles.
Type: Gauge
Labels:
- type: Request type
Kafka_request_time_milliseconds_total
Total time spent by Broker nodes processing requests.
Type: Counter
Labels:
- type: Request type
Kafka_request_time_50p(99p/mean/max)_milliseconds
Time spent by Broker nodes processing requests, represented by different percentiles.
Type: Gauge
Labels:
- type: Request type
Kafka_request_queue_time_milliseconds_total
Total queue time for requests at Broker nodes, which can increase when Kafka IO threads are busy.
Type: Counter
Labels:
- type: Request type
Kafka_request_queue_time_50p(99p/mean/max)_milliseconds
Broker node request queue time, represented by different percentiles.
Type: Gauge
Labels:
- type: Request, type
Kafka_response_queue_time_milliseconds_total
Broker node response queue time escalates when Kafka Network threads are occupied.
Type: Counter
Labels:
- type: Request type
Kafka_response_queue_time_50p(99p/mean/max)_milliseconds
Broker node response queue time, represented by different percentiles.
Type: Gauge
Labels:
- type: Request type
Kafka_request_queue_size
Size of the broker node's request queue.
- Type: Gauge
Kafka_response_queue_size
Size of the response queue on the Broker node.
- Type: Gauge
Kafka_purgatory_size
Number of requests pending on the Broker node from a producer or in fetch purgatory.
Type: Gauge
Labels:
type:
Produce
Fetch
Kafka_partition_count
Current tally of partitions allocated on the Broker node.
- Type: Gauge
Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds
Log flush duration on the Broker node; in AutoMQ, this corresponds to the Delta WAL flush time, represented at various percentiles.
- Type: Gauge
Kafka_log_end_offset
Maximum logical offset for each partition on the Broker node.
Type: Gauge
Labels:
topic
partition
Kafka_log_size
Message size allocated to each partition on the Broker node.
Type: Gauge
Labels:
topic
partition
Kafka_group_commit_offset
Consumption offset for each Consumer Group on the relevant partition, as reported by the Group Coordinator's Broker associated with each Consumer Group.
Type: Gauge
Labels:
consumer_group
topic
partition
Kafka_group_count
Count of Consumer Groups overseen by each Group Coordinator's Broker node.
- Type: Gauge
Kafka_group_preparing_rebalance_count
Number of Consumer Groups preparing for self-balancing.
- Type: Gauge
Kafka_group_completing_rebalance_count
Number of Consumer Groups awaiting Leader assignment status.
- Type: Gauge
Kafka_group_stable_count
Number of Consumer Groups in a Stable state.
- Type: Gauge
Kafka_group_empty_count
Number of Consumer Groups without members but not yet expired.
- Type: Gauge
Kafka_group_dead_count
Number of Consumer Groups devoid of members with metadata removed.
- Type: Gauge
Kafka_stream_upload_size_bytes_total
Total volume of data uploaded to Object storage by Broker nodes.
- Type: Counter
Kafka_stream_download_size_bytes_total
Total volume of data downloaded from Object storage by Broker nodes.
- Type: Counter
Kafka_stream_network_inbound_usage_bytes_total
Total inbound bandwidth usage of Broker nodes, encompassing message reception and data downloads from Object storage, calculated over time for incoming throughput.
- Type: Counter
Kafka_stream_network_outbound_usage_bytes_total
The total outgoing bandwidth usage of a Broker node, which includes both message consumption and data uploaded to Object storage, can be tracked over time to assess traffic throughput.
- Type: Counter
Kafka_stream_network_inbound_available_bandwidth_bytes
The inflow throughput reserved by the Broker node for cold reads and replication, when this value falls below the demand for cold read and replication inflow, will result in the affected requests being placed in the throttling queue. However, normal message transmission is not impacted by this throttling. Note that this metric only represents a snapshot at the time of sampling, and due to the sampling interval and the specific implementation of the throttling policy, it should only be considered as a reference.
- Type: Gauge
Kafka_stream_network_outbound_available_bandwidth_bytes
Similarly, the outflow throughput reserved by the Broker node for cold reads and replication, when insufficient to meet the demand, will see the affected requests placed in the throttling queue. This placement does not impact regular message transmission. Again, note that this metric only represents a snapshot at the time of sampling and is limited by the sampling interval and the specific implementation of the throttling policy, thus it is for reference only.
- Type: Gauge
Kafka_stream_network_inbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queuing time within the throttling queue when executing inflow requests for cold reads and replication.
- Type: Gauge
Kafka_stream_network_outbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queuing time within the throttling queue when executing outflow requests for cold reads and replication.
- Type: Gauge
Kafka_stream_operation_latency_50p(99p/mean/max)_nanoseconds
The operational duration of each phase within the AutoMQ S3Stream module.
Type: Gauge
Labels:
operation_type
operation_name