Skip to Main Content

Metrics

This article outlines the observable metric definitions for AutoMQ, providing insights into AutoMQ's performance and operational status.

Metrics for AutoMQ are outlined and presented in the Prometheus format; if you require metrics in different protocol formats, you will need to handle the conversion on your own.

General Metrics

Kafka_server_connection_count

Current number of connections established by the node.

  • Type: Gauge

Kafka_network_threads_idle_rate

Idle rate of Kafka SocketServer network threads, range: [0, 1.0].

  • Type: Gauge

Kafka_io_threads_idle_time_nanoseconds_total

Kafka request handler thread idle time, representing the cumulative value of the native Apache Kafka metric RequestHandlerAvgIdlePercent, measured in nanoseconds. Tracking this over time (in nanoseconds) allows you to determine the thread idle rate. It's important to note that when the node functions as a combined node (serving both as Controller and Broker), as each Controller and Broker have separate request handlers, this metric reflects the combined idle time for both. The highest possible idle rate derived is 2.0.

  • Type: Counter

Controller Metrics

Kafka_controller_active_count

Indicates whether the current Controller node is the active Controller; a metric value of 1 signifies active, while 0 denotes inactive.

  • Type: Gauge

Kafka_broker_active_count

Number of active Brokers currently in the cluster.

  • Type: Gauge

Kafka_broker_fenced_count

Number of Brokers currently fenced in the cluster.

  • Type: Gauge

Kafka_topic_count

Total count of Topics within the cluster.

  • Type: Gauge

Kafka_partition_total_count

Total count of partitions across the cluster.

  • Type: Gauge

Kafka_partition_offline_count

Total number of partitions without leaders in the Tcluster.

  • Type: Gauge

Kafka_stream_auto_balancer_metrics_time_delay_milliseconds

Latency for AutoBalancer monitoring metrics as reported by each Broker node in the cluster; should this latency surpass a specified threshold, the node is deemed out-of-sync by the AutoBalancer and will be excluded from partition reassignment processes.

  • Type: Gauge

  • Labels:

    • node_id: Identifier for the node that reported the AutoBalancer monitoring metrics

Kafka_stream_s3_object_count

Current total number of Objects uploaded to Object storage in the cluster, organized by the status of these Objects.

  • Type: Gauge

  • Labels:

    • state: Classification of object states into three types:

      • prepared: Objects that are incomplete and uncommitted

      • committed: Objects that have been fully written and committed

      • mark_destroyed: Objects designated for deletion, scheduled to be purged from object storage after a specified delay

Kafka_stream_s3_object_size_bytes

Total size of objects uploaded to object storage by the current cluster.

  • Type: Gauge

Kafka_stream_stream_object_num

Number of StreamObjects uploaded to object storage by the current cluster.

  • Type: Gauge

Kafka_stream_stream_set_object_num

Number of StreamSetObjects uploaded to object storage by each Broker in the current cluster.

  • Type: Gauge

  • Labels:

    • node_id: Identifier for the corresponding Broker node.

Broker Metrics

Kafka_message_count_total

The total number of messages received by Broker nodes, and the calculation over time reveals the message count throughput.

  • Type: Counter

  • Labels:

    • topic

Kafka_network_io_bytes_total

The total volume of messages received and sent by Broker nodes, and the calculation over time reveals the message size throughput.

  • Type: Counter

  • Labels:

    • topic

    • partition

    • direction:

      • in: indicates the reception of messages

      • out: indicates the dispatch of messages

Kafka_topic_request_count_total

The total number of requests received for each Topic on Broker nodes, specifically including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: request type

      • produce

      • fetch

Kafka_topic_request_failed_total

The total number of request failures for each Topic on Broker nodes, specifically including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: request type

      • produce

      • fetch

Kafka_request_count_total

Total number of requests received by the Broker node.

  • Type: Counter

  • Labels:

    • type: request type

    • version: API Version for this type of request

Kafka_request_error_count_total

Total number of failed requests on the Broker node, noting that this metric counts all requests, including successful ones, where a successful request's error code is NONE.

  • Type: Counter

  • Labels:

    • type: request type

    • error: error code, NONE indicates a successful request

Kafka_request_size_bytes_total

Total size of requests received by the Broker node.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_size_50p(99p/mean/max)_bytes

Size of requests received by Broker nodes, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_time_milliseconds_total

Total time spent by Broker nodes processing requests.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_time_50p(99p/mean/max)_milliseconds

Time spent by Broker nodes processing requests, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_queue_time_milliseconds_total

Total queue time for requests at Broker nodes, which can increase when Kafka IO threads are busy.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_queue_time_50p(99p/mean/max)_milliseconds

Broker node request queue time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request, type

Kafka_response_queue_time_milliseconds_total

Broker node response queue time escalates when Kafka Network threads are occupied.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_response_queue_time_50p(99p/mean/max)_milliseconds

Broker node response queue time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_queue_size

Size of the broker node's request queue.

  • Type: Gauge

Kafka_response_queue_size

Size of the response queue on the Broker node.

  • Type: Gauge

Kafka_purgatory_size

Number of requests pending on the Broker node from a producer or in fetch purgatory.

  • Type: Gauge

  • Labels:

    • type:

      • Produce

      • Fetch

Kafka_partition_count

Current tally of partitions allocated on the Broker node.

  • Type: Gauge

Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds

Log flush duration on the Broker node; in AutoMQ, this corresponds to the Delta WAL flush time, represented at various percentiles.

  • Type: Gauge

Kafka_log_end_offset

Maximum logical offset for each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_log_size

Message size allocated to each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_group_commit_offset

Consumption offset for each Consumer Group on the relevant partition, as reported by the Group Coordinator's Broker associated with each Consumer Group.

  • Type: Gauge

  • Labels:

    • consumer_group

    • topic

    • partition

Kafka_group_count

Count of Consumer Groups overseen by each Group Coordinator's Broker node.

  • Type: Gauge

Kafka_group_preparing_rebalance_count

Number of Consumer Groups preparing for self-balancing.

  • Type: Gauge

Kafka_group_completing_rebalance_count

Number of Consumer Groups awaiting Leader assignment status.

  • Type: Gauge

Kafka_group_stable_count

Number of Consumer Groups in a Stable state.

  • Type: Gauge

Kafka_group_empty_count

Number of Consumer Groups without members but not yet expired.

  • Type: Gauge

Kafka_group_dead_count

Number of Consumer Groups devoid of members with metadata removed.

  • Type: Gauge

Kafka_stream_upload_size_bytes_total

Total volume of data uploaded to Object storage by Broker nodes.

  • Type: Counter

Kafka_stream_download_size_bytes_total

Total volume of data downloaded from Object storage by Broker nodes.

  • Type: Counter

Kafka_stream_network_inbound_usage_bytes_total

Total inbound bandwidth usage of Broker nodes, encompassing message reception and data downloads from Object storage, calculated over time for incoming throughput.

  • Type: Counter

Kafka_stream_network_outbound_usage_bytes_total

The total outgoing bandwidth usage of a Broker node, which includes both message consumption and data uploaded to Object storage, can be tracked over time to assess traffic throughput.

  • Type: Counter

Kafka_stream_network_inbound_available_bandwidth_bytes

The inflow throughput reserved by the Broker node for cold reads and replication, when this value falls below the demand for cold read and replication inflow, will result in the affected requests being placed in the throttling queue. However, normal message transmission is not impacted by this throttling. Note that this metric only represents a snapshot at the time of sampling, and due to the sampling interval and the specific implementation of the throttling policy, it should only be considered as a reference.

  • Type: Gauge

Kafka_stream_network_outbound_available_bandwidth_bytes

Similarly, the outflow throughput reserved by the Broker node for cold reads and replication, when insufficient to meet the demand, will see the affected requests placed in the throttling queue. This placement does not impact regular message transmission. Again, note that this metric only represents a snapshot at the time of sampling and is limited by the sampling interval and the specific implementation of the throttling policy, thus it is for reference only.

  • Type: Gauge

Kafka_stream_network_inbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds

The queuing time within the throttling queue when executing inflow requests for cold reads and replication.

  • Type: Gauge

Kafka_stream_network_outbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds

The queuing time within the throttling queue when executing outflow requests for cold reads and replication.

  • Type: Gauge

Kafka_stream_operation_latency_50p(99p/mean/max)_nanoseconds

The operational duration of each phase within the AutoMQ S3Stream module.

  • Type: Gauge

  • Labels:

    • operation_type

    • operation_name