> ## Documentation Index
> Fetch the complete documentation index at: https://docs.automq.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Prometheus Metrics

> AutoMQ offers Kafka-compatible metrics with Prometheus format, enhancing cloud-native scalability and cost efficiency. Understand performance deeply with detailed insights.

This document provides detailed observability metrics for AutoMQ, enabling you to better understand its performance and operational status.

<Info>
  AutoMQ metrics are defined and presented in the Prometheus format. If other protocol formats are needed, independent conversion is required.
</Info>

## **General Metrics**

### Kafka\_server\_connection\_count

The current number of connections established by the node.

* Type: Gauge

### Kafka\_network\_threads\_idle\_rate

The idle rate of Kafka SocketServer network threads, ranging from \[0, 1.0].

* Type: Gauge

### Kafka\_io\_threads\_idle\_time\_nanoseconds\_total

The idle time of Kafka request handler threads is measured by the Apache Kafka native metric RequestHandlerAvgIdlePercent and is expressed as a percentage. By differentiating over time, you can determine the thread idle rate. Note that when a node functions as both a Controller and a Broker, each having its own request handler, this metric represents the combined value of both the Controller and Broker. The maximum idle rate derived from differentiation in this scenario is 2.0.

* Type: Counter

## Controller Metrics

### Kafka\_controller\_active\_count

This indicates whether the current Controller node is active. A metric value of 1 signifies it is active, while 0 indicates it is inactive.

* Type: Gauge

### Kafka\_broker\_active\_count

The number of active Brokers in the current cluster.

* Type: Gauge

### Kafka\_broker\_fenced\_count

The number of Brokers that are fenced in the current cluster.

* Type: Gauge

### Kafka\_topic\_count

Total number of topics in the current cluster.

* Type: Gauge

### Kafka\_partition\_total\_count

Total number of partitions in the current cluster.

* Type: Gauge

### Kafka\_partition\_offline\_count

Total number of partitions without leaders in the current cluster.

* Type: Gauge

### Kafka\_stream\_auto\_balancer\_metrics\_time\_delay\_milliseconds

The delay time for each broker node in the cluster to report AutoBalancer monitoring metrics. When this delay time exceeds a certain threshold, the broker node is deemed out-of-sync by the AutoBalancer and excluded from partition reassignment by the AutoBalancer.

* Type: Gauge

* Labels:

  * node\_id: The ID of the node reporting AutoBalancer monitoring metrics.

### Kafka\_stream\_s3\_object\_count

The current total number of objects uploaded to object storage by the cluster, categorized by object state.

* Type: Gauge

* Labels:

  * State: Object state, divided into the following three categories:

    * Prepared: Objects that have not yet completed writing and have not been committed

    * Committed: Objects that have completed writing and have been committed

    * Mark\_destroyed: Objects marked for deletion, which will be removed from object storage after a certain delay

### Kafka\_stream\_s3\_object\_size\_bytes

The total size of objects uploaded to object storage by the current cluster.

* Type: Gauge

### Kafka\_stream\_stream\_object\_num

The number of StreamObjects uploaded to object storage by the current cluster.

* Type: Gauge

### Kafka\_stream\_stream\_set\_object\_num

The number of StreamSetObjects uploaded to object storage by each Broker within the current cluster.

* Type: Gauge

* Labels:

  * node\_id: The corresponding Broker node ID

## Broker Metrics

### Kafka\_message\_count\_total

The derivative of the total number of messages received by the Broker node with respect to time gives the message count throughput.

* Type: Counter

* Labels:

  * topic

### Kafka\_network\_io\_bytes\_total

The derivative of the total size of messages received and sent by the Broker node with respect to time gives the message size throughput.

* Type: Counter

* Labels:

  * topic

  * partition

  * direction:

    * "in": indicates incoming messages

    * "out": indicates outgoing messages

### Kafka\_topic\_request\_count\_total

The total number of requests received for each Topic on the Broker node includes only the produce and fetch types of requests.

* Type: Counter

* Labels:

  * topic

  * type: Request Type

    * produce

    * fetch

### Kafka\_topic\_request\_failed\_total

The total number of request failures for each topic on the Broker node, including only produce and fetch request types.

* Type: Counter

* Labels:

  * topic

  * type: Request Type

    * produce

    * fetch

### Kafka\_request\_count\_total

The total number of requests received by the Broker node.

* Type: Counter

* Labels:

  * type: Request Type

  * version: The API version for the request of this type

### Kafka\_request\_error\_count\_total

The total number of failed requests on the Broker node. Note that this metric also accounts for successful requests, where the error code for a successful request is NONE.

* Type: Counter

* Labels:

  * type: Request Type

  * error: Error code, with NONE indicating a successful request

### Kafka\_request\_size\_bytes\_total

The total size of requests received by the Broker node.

* Type: Counter

* Labels:

  * type: Request Type

### Kafka\_request\_size\_50p(99p/mean/max)\_bytes

The size of requests received by Broker nodes, represented by different percentiles.

* Type: Gauge

* Labels:

  * type: Request Type

### Kafka\_request\_time\_milliseconds\_total

The total time taken by Broker nodes to process requests.

* Type: Counter

* Labels:

  * type: Request Type

### Kafka\_request\_time\_50p(99p/mean/max)\_milliseconds

Processing time for broker node requests, shown in various percentiles.

* Type: Gauge

* Labels:

  * type: Request Type

### Kafka\_request\_queue\_time\_milliseconds\_total

Total request queue time for broker nodes. When Kafka IO threads are busy, it results in longer request queue times.

* Type: Counter

* Labels:

  * type: Request Type

### Kafka\_request\_queue\_time\_50p(99p/mean/max)\_milliseconds

Request queue time for broker nodes, illustrated in different percentiles.

* Type: Gauge

* Labels:

  * type: Request Type

### Kafka\_response\_queue\_time\_milliseconds\_total

The response queue time on Broker nodes increases when Kafka Network threads are busy.

* Type: Counter

* Labels:

  * type: Request Type

### Kafka\_response\_queue\_time\_50p(99p/mean/max)\_milliseconds

Broker node response queue time is represented by different percentiles.

* Type: Gauge

* Labels:

  * type: Request Type

### Kafka\_request\_queue\_size

The request queue size for the broker node.

* Type: Gauge

### Kafka\_response\_queue\_size

The response queue size for the broker node.

* Type: Gauge

### Kafka\_purgatory\_size

The number of requests in the producer or fetch purgatory on the broker node.

* Type: Gauge

* Labels:

  * type:

    * Produce

    * Fetch

### Kafka\_partition\_count

The number of partitions currently assigned to the broker node.

* Type: Gauge

### Kafka\_logs\_flush\_time\_50p(99p/mean/max)\_milliseconds

The log flush time of the broker node; in AutoMQ, this indicates the flush time of Delta WAL, expressed by different percentiles.

* Type: Gauge

### Kafka\_log\_end\_offset

The maximum logical offset for each partition on the broker node.

* Type: Gauge

* Labels:

  * topic

  * partition

### Kafka\_log\_size

The message size for each partition on the broker node.

* Type: Gauge

* Labels:

  * topic

  * partition

### Kafka\_group\_commit\_offset

The consumption offset for each Consumer Group on the corresponding partition; note that this metric is reported by the Broker where the Group Coordinator for each Consumer Group resides.

* Type: Gauge

* Labels:

  * consumer\_group

  * topic

  * partition

### Kafka\_group\_count

The number of Consumer Groups managed by the Broker node where each Group Coordinator is located.

* Type: Gauge

### Kafka\_group\_preparing\_rebalance\_count

The number of Consumer Groups that are preparing to rebalance.

* Type: Gauge

### Kafka\_group\_completing\_rebalance\_count

Number of Consumer Groups waiting for state assignment from the Leader.

* Type: Gauge

### Kafka\_group\_stable\_count

Number of Consumer Groups in a Stable state.

* Type: Gauge

### Kafka\_group\_empty\_count

Number of Consumer Groups with no members but not yet expired.

* Type: Gauge

### Kafka\_group\_dead\_count

Number of Consumer Groups with no members and metadata already removed.

* Type: Gauge

### Kafka\_stream\_upload\_size\_bytes\_total

Total size of data uploaded by Broker nodes to object storage.

* Type: Counter

### Kafka\_stream\_download\_size\_bytes\_total

The total size of data downloaded from object storage by the Broker node.

* Type: Counter

### Kafka\_stream\_network\_inbound\_usage\_bytes\_total

The total inbound bandwidth usage of the Broker node, including received messages and data downloaded from object storage, can be analyzed over time to determine inbound throughput.

* Type: Counter

### Kafka\_stream\_network\_outbound\_usage\_bytes\_total

The total outbound bandwidth usage of the Broker node, including consumed messages and data uploaded to object storage, can be analyzed over time to determine outbound throughput.

* Type: Counter

### Kafka\_stream\_network\_inbound\_available\_bandwidth\_bytes

Inbound throughput reserved for cold reads and Compaction on the Broker node—when this value is less than what is required for cold reads and Compaction inbound traffic, the respective requests are queued in the rate limiting queue, thus not affecting the normal message sending and receiving traffic. Note that this metric only represents the instantaneous value at the time of sampling and is for reference only due to constraints from the sampling interval and rate limiting strategy implementation.

* Type: Gauge

### Kafka\_stream\_network\_outbound\_available\_bandwidth\_bytes

Outbound throughput reserved for cold reads and Compaction on the Broker node—when this value is less than what is required for cold reads and Compaction outbound traffic, the respective requests are queued in the rate limiting queue, thus not affecting the normal message sending and receiving traffic. Note that this metric only represents the instantaneous value at the time of sampling and is for reference only due to constraints from the sampling interval and rate limiting strategy implementation.

* Type: Gauge

### Kafka\_stream\_network\_inbound\_limiter\_queue\_time\_50p(99p/mean/max)\_nanoseconds

The queuing time for incoming requests for cold reads and Compaction in the rate limiting queue during execution.

* Type: Gauge

### Kafka\_stream\_network\_outbound\_limiter\_queue\_time\_50p(99p/mean/max)\_nanoseconds

The queuing time for outgoing requests for cold reads and Compaction in the rate limiting queue during execution.

* Type: Gauge

### Kafka\_stream\_operation\_latency\_50p(99p/mean/max)\_nanoseconds

The operational duration of each stage in the AutoMQ S3Stream module.

* Type: Gauge

* Labels:

  * operation\_type

  * operation\_name

### Kafka\_stream\_cert\_expiry\_timestamp\_milliseconds

This metric shows the expiration UNIX timestamp of the TLS certificate, measured in milliseconds.

* Type: gauge

* Labels:

  * instance: Instance ID.

  * job: Task identifier.

  * host\_name: System hostname.

  * cert\_subject: Certificate subject.

  * cert\_type: Certificate type, where `server_cert` represents a server certificate, and `truststore_cert` represents a CA certificate.

### Kafka\_stream\_cert\_days\_remaining

This metric indicates the number of days remaining until the expiration of the TLS certificate from the current moment.

* Type: gauge

* Labels:

  * instance: Instance ID.

  * job: Task identifier.

  * host\_name: System hostname.

  * cert\_subject: Certificate subject.

  * cert\_type: Certificate type, where `server_cert` represents a server certificate, and `truststore_cert` represents a CA certificate.
