Skip to Main Content

Metrics

This article specifically addresses the AutoMQ Kafka terminology within the open-source automq-for-kafka project by "AutoMQ CO." hosted on GitHub AutoMQ.

This article will explore the monitoring metrics provided by AutoMQ for Kafka, which are displayed in Prometheus format.

General Metrics

Kafka_server_connection_count

Current number of established connections per node.

  • Type: Gauge

Kafka_network_threads_idle_rate

Idle rate of Kafka SocketServer network threads, ranging from [0, 1.0].

  • Type: Gauge

Kafka_io_threads_idle_time_nanoseconds_total

Idle time of Kafka request handler threads, this metric represents a cumulative value of the native Apache Kafka® metric RequestHandlerAvgIdlePercent, measured in nanoseconds. By calculating the derivative over time (nanoseconds), the idle rate of threads can be determined. It's important to note that when the node functions as a combined node (serving both as Controller and Broker), the metric represents a combined value from both the Controller and Broker, with a possible maximum idle rate derived up to 2.0.

  • Type: Counter

Controller Metrics

Kafka_controller_active_count

Shows whether the current Controller node is an active Controller, with a metric value of 1 indicating active status, and 0 indicating non-active.

  • Type: Gauge

Kafka_broker_active_count

Current active broker count in the cluster.

  • Type: Gauge

Kafka_broker_fenced_count

Number of brokers in the cluster that are fenced.

  • Type: Gauge

Kafka_topic_count

Total number of topics in the cluster.

  • Type: Gauge

Kafka_partition_total_count

Total number of partitions in the cluster.

  • Type: Gauge

Kafka_partition_offline_count

Total number of leaderless partitions in the cluster.

  • Type: Gauge

Kafka_stream_auto_balancer_metrics_time_delay_milliseconds

The latency of each broker node in reporting AutoBalancer monitoring metrics; if the latency exceeds a specific threshold, the broker node is deemed out-of-sync and is excluded from the AutoBalancer's partition reassignments.

  • Type: Gauge

  • Labels:

    • node_id: Node ID reporting the AutoBalancer monitoring metrics

Kafka_stream_s3_object_count

Total number of objects uploaded to object storage in the current cluster, categorized by the status of the objects.

  • Type: Gauge

  • Labels:

    • state: Object status, categorized into three types:

      • prepared: Objects that are still being written and have not been committed

      • committed: Objects that have finished writing and have been committed

      • mark_destroyed: Objects designated for deletion, to be removed from object storage after a certain delay

Kafka_stream_s3_object_size_bytes

Total size of Objects uploaded to object storage by the current cluster.

  • Type: Gauge

Kafka_stream_stream_object_num

Number of StreamObjects uploaded to object storage by the current e cluster.

  • Type: Gauge

Kafka_stream_stream_set_object_num

Number of StreamSetObjects uploaded to object storage by each Broker in the current cluster.

  • Type: Gauge

  • Labels:

    • node_id: The corresponding Broker node id

Broker Metrics

Kafka_message_count_total

The total number of messages received by Broker nodes, differentiated by time to assess message count throughput.

  • Type: Counter

  • Labels:

    • topic

Kafka_network_io_bytes_total

The total volume of messages received and dispatched by Broker nodes, differentiated by time to assess message size throughput.

  • Type: Counter

  • Labels:

    • topic

    • partition

    • direction:

      • in: indicates messages received

      • out: indicates messages sent

Kafka_topic_request_count_total

The total number of requests received by each Topic on Broker nodes, specifically including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: request type

      • produce

      • fetch

Kafka_topic_request_failed_total

The total number of failed requests for each Topic on Broker nodes, specifically including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: Request Type

      • produce

      • fetch

Kafka_request_count_total

Total number of requests received by Broker nodes.

  • Type: Counter

  • Labels:

    • type: Request Type

    • version: Api Version of the request type

Kafka_request_error_count_total

Total number of request failures on Broker nodes, note that even successful requests are counted in this metric, with an error code of NONE for successful requests.

  • Type: Counter

  • Labels:

    • type: Request Type

    • error: error code, NONE indicates the request was successful

Kafka_request_size_bytes_total

Total size of requests received by Broker nodes.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_request_size_50p(99p/mean/max)_bytes

Size of requests received by Broker nodes, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_request_time_milliseconds_total

Total time spent by Broker nodes in processing requests.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_request_time_50p(99p/mean/max)_milliseconds

Time spent by Broker nodes in processing requests, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_request_queue_time_milliseconds_total

Total queuing time of requests at Broker nodes, which increases when Kafka IO threads are busy.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_request_queue_time_50p(99p/mean/max)_milliseconds

Broker node request queuing time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_response_queue_time_milliseconds_total

Broker node response queuing time, which can increase when Apache Kafka® Network threads are busy.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_response_queue_time_50p(99p/mean/max)_milliseconds

Broker node response queuing time, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_request_queue_size

Broker node request queue size.

  • Type: Gauge

Kafka_response_queue_size

Size of the response queue for broker nodes.

  • Type: Gauge

Kafka_purgatory_size

Number of requests pending on broker nodes from either the producer or fetch purgatory.

  • Type: Gauge

  • Labels:

    • type:

      • Produce

      • Fetch

Kafka_partition_count

Current count of partitions allocated to broker nodes.

  • Type: Gauge

Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds

Log flush time for broker nodes in AutoMQ for Kafka, depicted through the flush time of Delta WAL across various percentiles.

  • Type: Gauge

Kafka_log_end_offset

Maximum logical offset for each partition on broker nodes.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_log_size

Message size for each partition on broker nodes.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_group_commit_offset

Consumption offset for each Consumer Group on the respective partition, note that this metric is provided by the Group Coordinator's broker for each Consumer Group.

  • Type: Gauge

  • Labels:

    • consumer_group

    • topic

    • partition

Kafka_group_count

Number of Consumer Groups overseen by each Group Coordinator's broker node.

  • Type: Gauge

Kafka_group_preparing_rebalance_count

Number of Consumer Groups preparing for self-balancing.

  • Type: Gauge

Kafka_group_completing_rebalance_count

Number of Consumer Groups awaiting state assignments from the Leader.

  • Type: Gauge

Kafka_group_stable_count

Number of Consumer Groups in a Stable state.

  • Type: Gauge

Kafka_group_empty_count

Number of Consumer Groups without any members but not yet expired.

  • Type: Gauge

Kafka_group_dead_count

Number of Consumer Groups without any members and with metadata removed.

  • Type: Gauge

Kafka_stream_upload_size_bytes_total

Total size of data uploaded to Object storage by Broker nodes.

  • Type: Counter

Kafka_stream_download_size_bytes_total

Total size of data downloaded from Object storage by Broker locations.

  • Type: Counter

Kafka_stream_network_inbound_usage_bytes_total

Total inbound bandwidth usage of Broker nodes, including message reception and data downloads from object storage, calculated by deriving the inbound throughput over time.

  • Type: Counter

  • Labels:

    • type:

      • bypass: refers to the inbound bandwidth usage that is not subject to rate limiting, equivalent to the message inflow of a Broker node.

      • catchup: represents the inbound traffic generated by cold reads, that is, due to cache misses or prefetching strategies from S3.

      • compaction: indicates the inbound traffic generated by Stream Set Object Compaction, i.e., data read from S3 during compaction.

Kafka_stream_network_outbound_usage_bytes_total

The total outbound bandwidth usage of a Broker node, including consuming messages and uploading data to object storage, can be derived over time to calculate the throughput.

  • Type: Counter

  • Labels:

    • type:

      • bypass: represents the outbound bandwidth usage that is not subject to rate limiting, such as the Broker node's message outflow when consuming hot data or when uploading Delta WAL to S3.

      • catchup: represents the outbound traffic generated by cold reads, equivalent to the Broker node's outflow when consuming cold data.

      • compaction: indicates the outbound traffic generated by Stream Set Object Compaction, i.e., data uploaded to S3 during compaction.

Kafka_stream_network_inbound_available_bandwidth_bytes

Broker node's reserved inbound throughput for cold reads and compaction, when this value is less than the demand for inbound traffic from cold reads and compaction, the corresponding requests will be placed in a rate limiting queue to wait. Note, this metric only represents the instantaneous value at the time of sampling, subject to the sampling interval and the specific implementation of the rate limiting policy, and should only be used for reference.

  • Type: Gauge

Kafka_stream_network_outbound_available_bandwidth_bytes

Broker nodes allocate outbound throughput for cold reads and compaction; if this throughput falls short of the requirements for cold reads and compaction, the corresponding requests will be placed in the rate limiting queue, though normal message transmission remains unaffected. It's important to note that this metric only captures the instantaneous value at the time of sampling and depends on the specific sampling interval and rate limiting policy; therefore, it should be considered for reference purposes only.

  • Type: Gauge

Kafka_stream_network_inbound_limiter_queue_time_50p(99p/max/sum)_nanoseconds

When executing cold read and compaction inbound traffic requests, measure their queue time in the rate limiting queue.

  • Type: Gauge

Kafka_stream_network_outbound_limiter_queue_time_50p(99p/max/sum)_nanoseconds

When executing cold read and compaction outbound traffic requests, measure their queue time in the rate limiting queue.

  • Type: Gauge

Kafka_stream_operation_latency_50p(99p/max/sum)_nanoseconds

Measure the operation duration for each phase of the AutoMQ for Kafka S3Stream module.

  • Type: Gauge

  • Labels:

    • operation_type

    • operation_name