Skip to main content

Rule Metrics

Following are detailed descriptions of the alarm rule metrics by component type.

  • Broker
  • Kafka Network
  • Partition
  • Node
  • ZooKeeper
  • Schema Registry
  • Consumer Group
  • Topic
  • Connect
  • CMPS
  • Connector
  • Data mirroring

Broker Metrics

MetricDescription
Number of brokers in a clusterAn alarm is triggered when the number of online brokers meets the settings in the metric details
Abnormal broker status (not running)An alarm is triggered when the broker state is anything other than running
Abnormal number of active controllerAn alarm is triggered when there is no active controller broker
Broker disk skewedThe distribution of disk usage is calculated by comparing the broker with the highest disk usage and the broker with the lowest disk usage. If this distribution meets the settings in the metric details, the disk usage is deemed unbalanced, triggering an alarm
Increasing producer request failuresAn alarm is triggered when the producer request failure rate meets the settings in the metric details
Kafka Broker instance DOWNAn alarm is triggered when all broker instances (servers) are down
Produce messagesAn alarm is triggered when the number of messages produced meets the metric detail settings
Produce bytesAn alarm is triggered when the size (in Bytes) of messages produced meets the metric detail settings
Consume bytesAn alarm is triggered when the size (in Bytes) of messages consumed by the consumer meets the metric detail settings
Consumer lagAn alarm is triggered when the size of the consumer lag (the difference between the offset of the data input by the producer and the offset of the data taken by the consumer) meets the metric detail settings
KRaft controller broker DOWNAn alarm is triggered when the controller broker goes down in kraft cluster

Kafka Network Metrics

MetricDescription
Remaining network resourceAn alarm is triggered when the idle rate, calculated as the ratio of idle network resources to total network resources, matches the metric detail settings
Request latency - Fetch followerAn alarm is triggered when the time it takes for the follower replica of the partition to receive a response after sending a replication request meets the metric detail settings
Request latency - Fetch consumerAn alarm is triggered when the time it takes for the consumer to receive a response after sending a consumption request meets the metric detail settings
Request latency - ProduceAn alarm is triggered when the time it takes for the client to receive a response after sending a produce request meets the metric detail settings

Partition Metrics

MetricDescription
Broker partition skewedAn alarm is triggered when the distribution of partition counts between the broker with the most partitions and the broker with the least partitions meets the metric detail settings
Leader partition skewedAn alarm is triggered when the distribution of leader partition counts between the broker with the most leader partitions and the broker with the least leader partitions meets the metric detail settings
Number of the offline-partitionsAn alarm is triggered when the number of offline partitions meets the metric detail settings
Number of the partitions in a brokerAn alarm is triggered when the total number of partitions within a broker meets the metric detail settings
Number of the partitions in a clusterAn alarm is triggered when the total number of partitions meets the metric detail settings
Number of the under-min-ISR partitionsAn alarm is triggered when the number of partitions that don't meet the minimum number of replicas required for the ISR (In-Sync Replicas - follower replicas synchronized with the leader partition) meets the metric detail settings
Number of the under-replicated partitionsAn alarm is triggered when the number of unreplicated partitions meets the metric detail settings

Node Metrics

MetricDescription
CPU usageAn alarm is triggered when CPU usage (%) meets the metric detail settings
Node disk usageAn alarm is triggered when disk usage (%) on the node meets the metric detail settings
Node DOWNAn alarm is triggered when all nodes are down
Memory usageAn alarm is triggered when memory usage (%) meets the metric detail settings
Node disk usage per mount pointAn alarm is triggered when disk usage (%) per mount point on the node meets the metric detail settings

ZooKeeper Metrics

MetricDescription
ZooKeeper connection status warningAn alarm is triggered when the connection status between the broker and the ZooKeeper is interrupted
ZooKeeper instance DOWNAn alarm is triggered when all instances (servers) of the registered ZooKeeper are down

Schema Registry Metrics

MetricDescription
Schema Registry instance DOWNAn alarm is triggered when all instances (servers) of the registered Schema Registry Cluster are down

Consumer Group Metrics

MetricDescription
Consumer group lagAn alarm is triggered when the size of the lag (difference between the offset of the data the producer put in and the data the consumer took) in the consumer group meets the metric detail settings
Consumer group status is not STABLEAn alarm is triggered when one or more partitions within a consumer group are detected to be in a delayed, paused, or rewound state. In these cases, the consumer group's status is deemed abnormal
Number of consumer instances in a consumer groupAn alarm is triggered when the number of consumer instances in the consumer group meets the metric detail settings

Topic Metrics

MetricDescription
Topic message-in/secAn alarm is triggered when the number of messages produced per second for the topic meets the metric detail settings
Topic byte-in/secAn alarm is triggered when the size (in Bytes) of messages produced per second for the topic meets the metric detail settings
Topic byte-out/secAn alarm is triggered when the size (in Bytes) of messages consumed per second from the topic meets the metric detail settings
Increasement of topic message-in in the last T minutesAn alarm is triggered when the increase in the number of messages produced for the topic in the last T minutes meets the metric detail settings
Increasement of topic byte-in in the last T minutesAn alarm is triggered when the increase in the size (in Bytes) of messages produced for the topic in the last T minutes meets the metric detail settings
Increasement of topic byte-out in the last T minutesAn alarm is triggered when the increase in the size (in Bytes) of messages consumed from the topic in the last T minutes meets the metric detail settings
Increasement of topic message-in in the last T hoursAn alarm is triggered when the increase in the number of messages produced for the topic in the last T hours meets the metric detail settings
Increasement of topic byte-in in the last T hoursAn alarm is triggered when the increase in the size (in Bytes) of messages produced for the topic in the last T hours meets the metric detail settings
Increasement of topic byte-out in the last T hoursAn alarm is triggered when the increase in the size (in Bytes) of messages consumed from the topic in the last T hours meets the metric detail settings

Connect Metrics

MetricDescription
Connect instance DOWNAn alarm is triggered when all instances (servers) of the registered Connect Cluster are down

CMPS Metrics

MetricDescription
Cluster Message Consumption Per SecondAlarm triggered when cluster message consumption per second meets the metric detailed settings
Cluster Message Consumption Per Second (Consumer Group Level)Alarm triggered when cluster message consumption per second per consumer group meets the metric detailed settings
Cluster Message Consumption Per Second (Consumer Group - Topic Level)Alarm triggered when cluster message consumption per second per consumer group - topic meets the metric detailed settings

Connector Metrics

MetricDescription
Task status abnormal (failed)An alarm is triggered when the task status changes to failed
Messages polled per second (poll) [Source Connector]An alarm is triggered when the number of messages polled per second by the source connector meets the metric details settings
Messages written per second (write) [Source Connector]An alarm is triggered when the number of messages written per second by the source connector meets the metric details settings
Messages read per second (read) [Sink Connector]An alarm is triggered when the number of messages read per second by the sink connector meets the metric details settings
Messages sent per second (send) [Sink Connector]An alarm is triggered when the number of messages sent per second by the sink connector meets the metric details settings
Number of records failed by connectorAn alarm is triggered when the number of records the connector failed to process meets the metric details settings
Number of write failures to DLT (Dead Letter Topic)Indicates the number of attempts to write to the DLT that failed for records the connector failed to process. An alarm is triggered when the number meets the metric details settings

Data Mirroring Metrics

MetricDescription
Messages processed per second by topic (bytes)An alarm is triggered when the number of bytes of messages replicated per second by topic meets the metric details settings
Mirroring job lagAn alarm is triggered when the mirroring job lag meets the metric details settings