Skip to Main Content

Metrics

In this article, the term AutoMQ Kafka specifically refers to the open-source automq-for-kafka project by AutoMQ HK Limited, available through the AutoMQ organization on GitHub.

This article will introduce the monitoring metrics provided by AutoMQ for Kafka (displayed in Prometheus format).

General Metrics

Kafka_server_connection_count

The current number of connections established by the node.

  • Type: Gauge

Kafka_network_threads_idle_rate

The idle rate of Kafka SocketServer network threads, ranging from [0, 1.0].

  • Type: Gauge

Kafka_io_threads_idle_time_nanoseconds_total

The idle time of Kafka request handler threads. This metric is the cumulative value of the Apache Kafka native metric RequestHandlerAvgIdlePercent, measured in nanoseconds. By deriving the time (in nanoseconds), you can obtain the thread idle rate. Note that when the node is a combination node (acting as both Controller and Broker), since Controller and Broker have their own request handlers respectively, this metric is the combined value of the Controller and Broker. The maximum thread idle rate derived can be 2.0.

  • Type: Counter

Controller Metrics

Kafka_controller_active_count

Indicates whether the current Controller node is the active Controller. A metric value of 1 means it is active, while 0 means it is not active.

  • Type: Gauge

Kafka_broker_active_count

The number of active Brokers in the current cluster.

  • Type: Gauge

Kafka_broker_fenced_count

The number of fenced Brokers in the current cluster.

  • Type: Gauge

Kafka_topic_count

The total number of Topics in the current cluster.

  • Type: Gauge

Kafka_partition_total_count

The total number of partitions in the current cluster.

  • Type: Gauge

Kafka_partition_offline_count

The total number of offline partitions in the current cluster.

  • Type: Gauge

Kafka_stream_auto_balancer_metrics_time_delay_milliseconds

The delay in reporting AutoBalancer monitoring metrics by each Broker node in the cluster. If the delay exceeds a certain threshold, the Broker node will be considered out-of-sync by AutoBalancer and will no longer participate in AutoBalancer's partition reassignment.

  • Type: Gauge

  • Labels:

    • node_id: The ID of the node reporting AutoBalancer monitoring metrics.

Kafka_stream_s3_object_count

The total number of Objects uploaded to Object storage by the current cluster, categorized by Object state.

  • Type: Gauge

  • Labels:

    • state: Object states are categorized into three types:

      • prepared: Objects that have not yet been fully written and committed

      • committed: Objects that have been fully written and committed

      • mark_destroyed: Objects marked for deletion, which will be removed from object storage after a certain delay

Kafka_stream_s3_object_size_bytes

The total size of objects uploaded to object storage by the current cluster

  • Type: Gauge

Kafka_stream_stream_object_num

The number of StreamObjects uploaded to object storage by the current cluster

  • Type: Gauge

Kafka_stream_stream_set_object_num

The number of StreamSetObjects uploaded to object storage by each broker in the current cluster

  • Type: Gauge

  • Labels:

    • node_id: Corresponding broker node ID

Broker Metrics

Kafka_message_count_total

The total number of messages received by the Broker node. By monitoring over time, the message throughput can be determined.

  • Type: Counter

  • Labels:

    • topic

Kafka_network_io_bytes_total

The total size of messages received and sent by the Broker node. By monitoring over time, the message size throughput can be determined.

  • Type: Counter

  • Labels:

    • topic

    • partition

    • direction:

      • in: indicates receiving messages

      • out: indicates sending messages

Kafka_topic_request_count_total

The total number of requests received by each Topic on the Broker node, including only produce and fetch types of requests.

  • Type: Counter

  • Labels:

    • topic

    • type: request type

      • produce

      • fetch

Kafka_topic_request_failed_total

The total number of failed requests for each Topic on the Broker node, including only produce and fetch types of requests.

  • Type: Counter

  • Labels:

    • topic

    • type: Request type

      • produce

      • fetch

Kafka_request_count_total

Total number of requests received by the Broker node.

  • Type: Counter

  • Labels:

    • type: Request type

    • version: API version of the request type

Kafka_request_error_count_total

Total number of failed requests at the Broker node. Note that even successful requests are included in this metric, with successful requests having an error code of NONE.

  • Type: Counter

  • Labels:

    • type: Request type

    • error: Error code, where NONE indicates the request was successful

Kafka_request_size_bytes_total

Total size of requests received by the Broker node.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_size_50p(99p/mean/max)_bytes

The size of the requests received by the Broker node, expressed in various percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_time_milliseconds_total

The total time taken by the Broker node to process the requests.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_time_50p(99p/mean/max)_milliseconds

The time taken by the Broker node to process the requests, expressed in various percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_queue_time_milliseconds_total

The total request queue time on the Broker node, which increases when Kafka IO threads are busy.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_queue_time_50p(99p/mean/max)_milliseconds

Broker node request queuing time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_response_queue_time_milliseconds_total

Broker node response queuing time, which increases when Kafka network threads are busy.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_response_queue_time_50p(99p/mean/max)_milliseconds

Broker node response queuing time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_queue_size

Broker node request queue size.

  • Type: Gauge

Kafka_response_queue_size

The size of the response queue for Broker nodes.

  • Type: Gauge

Kafka_purgatory_size

The number of requests in the producer or fetch purgatory on the Broker node.

  • Type: Gauge

  • Labels:

    • type:

      • Produce

      • Fetch

Kafka_partition_count

The number of partitions currently assigned to the Broker node.

  • Type: Gauge

Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds

The log flush time on the Broker node, which in AutoMQ for Kafka represents the flush time of Delta WAL, expressed in different percentiles.

  • Type: Gauge

Kafka_log_end_offset

The maximum logical offset of each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_log_size

The message size of each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_group_commit_offset

The consumption offsets of each Consumer Group on the corresponding partitions. Note that this metric is reported by the Broker where the Group Coordinator of each Consumer Group resides.

  • Type: Gauge

  • Labels:

    • consumer_group

    • topic

    • partition

Kafka_group_count

The number of Consumer Groups managed by the Broker node where each Group Coordinator resides.

  • Type: Gauge

Kafka_group_preparing_rebalance_count

Number of Consumer Groups currently preparing for rebalance.

  • Type: Gauge

Kafka_group_completing_rebalance_count

Number of Consumer Groups waiting for Leader state assignment.

  • Type: Gauge

Kafka_group_stable_count

Number of Consumer Groups in Stable state.

  • Type: Gauge

Kafka_group_empty_count

Number of Consumer Groups with no members but not yet expired.

  • Type: Gauge

Kafka_group_dead_count

Number of Consumer Groups with no members and metadata removed.

  • Type: Gauge

Kafka_stream_upload_size_bytes_total

Total data size uploaded to object storage by broker nodes.

  • Type: Counter

Kafka_stream_download_size_bytes_total

Total data size downloaded from object storage by broker nodes.

  • Type: Counter

Kafka_stream_network_inbound_usage_bytes_total

Total inbound bandwidth usage of broker nodes, including received messages and data downloaded from object storage, with throughput derived over time.

  • Type: Counter

  • Labels:

    • type:

      • bypass: Indicates the inbound bandwidth usage that is not throttled, equivalent to the message ingress traffic of the Broker node.

      • catchup: Refers to the inbound traffic generated by cold reads, i.e., the ingress traffic from reading data from S3 due to cache misses or prefetch strategies.

      • compaction: Indicates the inbound traffic generated by Stream Set Object Compaction, i.e., the ingress traffic from reading data from S3 during compaction.

Kafka_stream_network_outbound_usage_bytes_total

The total outbound bandwidth usage of the Broker node, including the consumption of messages and the amount of data uploaded to object storage, can derive throughput over time.

  • Type: Counter

  • Labels:

    • type:

      • bypass: Indicates the outbound bandwidth usage that is not throttled, such as the message egress traffic of the Broker node when consuming hot data, or the outbound traffic when the Broker uploads Delta WAL to S3.

      • catchup: Refers to the outbound traffic generated by cold reads, equivalent to the egress traffic of the Broker node when consuming cold data.

      • compaction: Indicates the outbound traffic generated by Stream Set Object Compaction, i.e., the outbound traffic when uploading data to S3 during compaction.

Kafka_stream_network_inbound_available_bandwidth_bytes

The inbound traffic throughput reserved by the Broker node for cold reads and compaction. When this value is less than the inbound traffic demand for cold reads and compaction, the corresponding requests will be placed in the throttling queue, and the normal message send/receive traffic will not be affected by this throttling. Note that this metric represents only an instantaneous value at the time of sampling and should be used for reference only due to the limitations of sampling intervals and the specific implementation of the throttling strategy.

  • Type: Gauge

Kafka_stream_network_outbound_available_bandwidth_bytes

The outbound traffic throughput reserved for cold reads and compaction on the Broker nodes. When this value is less than the outbound traffic demand for cold reads and compaction, the corresponding requests will be placed in a throttling queue. Normal message traffic is not affected by this throttling. Note that this metric only represents an instantaneous value at the time of sampling and, due to sampling intervals and throttling strategy implementations, this metric is for reference only.

  • Type: Gauge

Kafka_stream_network_inbound_limiter_queue_time_50p(99p/max/sum)_nanoseconds

The queuing time in the throttling queue when inbound traffic requests for cold reads and compaction are being processed.

  • Type: Gauge

Kafka_stream_network_outbound_limiter_queue_time_50p(99p/max/sum)_nanoseconds

The queuing time in the throttling queue when outbound traffic requests for cold reads and compaction are being processed.

  • Type: Gauge

Kafka_stream_operation_latency_50p(99p/max/sum)_nanoseconds

The operation latency at each stage of the AutoMQ for Kafka S3Stream module.

  • Type: Gauge

  • Labels:

    • operation_type

    • operation_name