Skip to Main Content


This article outlines the observable metric definitions for AutoMQ, providing insights into AutoMQ's performance and operational status.

Metrics for AutoMQ are outlined and presented in the Prometheus format; if you require metrics in different protocol formats, you will need to handle the conversion on your own.

General Metrics


Current number of connections established by the node.

  • Type: Gauge


Idle rate of Kafka SocketServer network threads, range: [0, 1.0].

  • Type: Gauge


Kafka request handler thread idle time, representing the cumulative value of the native Apache Kafka metric RequestHandlerAvgIdlePercent, measured in nanoseconds. Tracking this over time (in nanoseconds) allows you to determine the thread idle rate. It's important to note that when the node functions as a combined node (serving both as Controller and Broker), as each Controller and Broker have separate request handlers, this metric reflects the combined idle time for both. The highest possible idle rate derived is 2.0.

  • Type: Counter

Controller Metrics


Indicates whether the current Controller node is the active Controller; a metric value of 1 signifies active, while 0 denotes inactive.

  • Type: Gauge


Number of active Brokers currently in the cluster.

  • Type: Gauge


Number of Brokers currently fenced in the cluster.

  • Type: Gauge


Total count of Topics within the cluster.

  • Type: Gauge


Total count of partitions across the cluster.

  • Type: Gauge


Total number of partitions without leaders in the Tcluster.

  • Type: Gauge


Latency for AutoBalancer monitoring metrics as reported by each Broker node in the cluster; should this latency surpass a specified threshold, the node is deemed out-of-sync by the AutoBalancer and will be excluded from partition reassignment processes.

  • Type: Gauge

  • Labels:

    • node_id: Identifier for the node that reported the AutoBalancer monitoring metrics


Current total number of Objects uploaded to Object storage in the cluster, organized by the status of these Objects.

  • Type: Gauge

  • Labels:

    • state: Classification of object states into three types:

      • prepared: Objects that are incomplete and uncommitted

      • committed: Objects that have been fully written and committed

      • mark_destroyed: Objects designated for deletion, scheduled to be purged from object storage after a specified delay


Total size of objects uploaded to object storage by the current cluster.

  • Type: Gauge


Number of StreamObjects uploaded to object storage by the current cluster.

  • Type: Gauge


Number of StreamSetObjects uploaded to object storage by each Broker in the current cluster.

  • Type: Gauge

  • Labels:

    • node_id: Identifier for the corresponding Broker node.

Broker Metrics


The total number of messages received by Broker nodes, and the calculation over time reveals the message count throughput.

  • Type: Counter

  • Labels:

    • topic


The total volume of messages received and sent by Broker nodes, and the calculation over time reveals the message size throughput.

  • Type: Counter

  • Labels:

    • topic

    • partition

    • direction:

      • in: indicates the reception of messages

      • out: indicates the dispatch of messages


The total number of requests received for each Topic on Broker nodes, specifically including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: request type

      • produce

      • fetch


The total number of request failures for each Topic on Broker nodes, specifically including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: request type

      • produce

      • fetch


Total number of requests received by the Broker node.

  • Type: Counter

  • Labels:

    • type: request type

    • version: API Version for this type of request


Total number of failed requests on the Broker node, noting that this metric counts all requests, including successful ones, where a successful request's error code is NONE.

  • Type: Counter

  • Labels:

    • type: request type

    • error: error code, NONE indicates a successful request


Total size of requests received by the Broker node.

  • Type: Counter

  • Labels:

    • type: Request type


Size of requests received by Broker nodes, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type


Total time spent by Broker nodes processing requests.

  • Type: Counter

  • Labels:

    • type: Request type


Time spent by Broker nodes processing requests, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type


Total queue time for requests at Broker nodes, which can increase when Kafka IO threads are busy.

  • Type: Counter

  • Labels:

    • type: Request type


Broker node request queue time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request, type


Broker node response queue time escalates when Kafka Network threads are occupied.

  • Type: Counter

  • Labels:

    • type: Request type


Broker node response queue time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type


Size of the broker node's request queue.

  • Type: Gauge


Size of the response queue on the Broker node.

  • Type: Gauge


Number of requests pending on the Broker node from a producer or in fetch purgatory.

  • Type: Gauge

  • Labels:

    • type:

      • Produce

      • Fetch


Current tally of partitions allocated on the Broker node.

  • Type: Gauge


Log flush duration on the Broker node; in AutoMQ, this corresponds to the Delta WAL flush time, represented at various percentiles.

  • Type: Gauge


Maximum logical offset for each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition


Message size allocated to each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition


Consumption offset for each Consumer Group on the relevant partition, as reported by the Group Coordinator's Broker associated with each Consumer Group.

  • Type: Gauge

  • Labels:

    • consumer_group

    • topic

    • partition


Count of Consumer Groups overseen by each Group Coordinator's Broker node.

  • Type: Gauge


Number of Consumer Groups preparing for self-balancing.

  • Type: Gauge


Number of Consumer Groups awaiting Leader assignment status.

  • Type: Gauge


Number of Consumer Groups in a Stable state.

  • Type: Gauge


Number of Consumer Groups without members but not yet expired.

  • Type: Gauge


Number of Consumer Groups devoid of members with metadata removed.

  • Type: Gauge


Total volume of data uploaded to Object storage by Broker nodes.

  • Type: Counter


Total volume of data downloaded from Object storage by Broker nodes.

  • Type: Counter


Total inbound bandwidth usage of Broker nodes, encompassing message reception and data downloads from Object storage, calculated over time for incoming throughput.

  • Type: Counter


The total outgoing bandwidth usage of a Broker node, which includes both message consumption and data uploaded to Object storage, can be tracked over time to assess traffic throughput.

  • Type: Counter


The inflow throughput reserved by the Broker node for cold reads and replication, when this value falls below the demand for cold read and replication inflow, will result in the affected requests being placed in the throttling queue. However, normal message transmission is not impacted by this throttling. Note that this metric only represents a snapshot at the time of sampling, and due to the sampling interval and the specific implementation of the throttling policy, it should only be considered as a reference.

  • Type: Gauge


Similarly, the outflow throughput reserved by the Broker node for cold reads and replication, when insufficient to meet the demand, will see the affected requests placed in the throttling queue. This placement does not impact regular message transmission. Again, note that this metric only represents a snapshot at the time of sampling and is limited by the sampling interval and the specific implementation of the throttling policy, thus it is for reference only.

  • Type: Gauge


The queuing time within the throttling queue when executing inflow requests for cold reads and replication.

  • Type: Gauge


The queuing time within the throttling queue when executing outflow requests for cold reads and replication.

  • Type: Gauge


The operational duration of each phase within the AutoMQ S3Stream module.

  • Type: Gauge

  • Labels:

    • operation_type

    • operation_name