Skip to main content

Rule Metrics

Following are detailed descriptions of the alarm rule metrics by component type.

Broker
Kafka Network
Partition
Node
ZooKeeper
Schema Registry
Consumer Group
Topic
Connect
CMPS
Connector
Data mirroring

Broker Metrics

Metric	Description
Number of brokers in a cluster	An alarm is triggered when the number of online brokers meets the settings in the metric details
Abnormal broker status (not running)	An alarm is triggered when the broker state is anything other than running
Abnormal number of active controller	An alarm is triggered when there is no active controller broker
Broker disk skewed	The distribution of disk usage is calculated by comparing the broker with the highest disk usage and the broker with the lowest disk usage. If this distribution meets the settings in the metric details, the disk usage is deemed unbalanced, triggering an alarm
Increasing producer request failures	An alarm is triggered when the producer request failure rate meets the settings in the metric details
Kafka Broker instance DOWN	An alarm is triggered when all broker instances (servers) are down
Produce messages	An alarm is triggered when the number of messages produced meets the metric detail settings
Produce bytes	An alarm is triggered when the size (in Bytes) of messages produced meets the metric detail settings
Consume bytes	An alarm is triggered when the size (in Bytes) of messages consumed by the consumer meets the metric detail settings
Consumer lag	An alarm is triggered when the size of the consumer lag (the difference between the offset of the data input by the producer and the offset of the data taken by the consumer) meets the metric detail settings
KRaft controller broker DOWN	An alarm is triggered when the controller broker goes down in kraft cluster

Kafka Network Metrics

Metric	Description
Remaining network resource	An alarm is triggered when the idle rate, calculated as the ratio of idle network resources to total network resources, matches the metric detail settings
Request latency - Fetch follower	An alarm is triggered when the time it takes for the follower replica of the partition to receive a response after sending a replication request meets the metric detail settings
Request latency - Fetch consumer	An alarm is triggered when the time it takes for the consumer to receive a response after sending a consumption request meets the metric detail settings
Request latency - Produce	An alarm is triggered when the time it takes for the client to receive a response after sending a produce request meets the metric detail settings

Partition Metrics

Metric	Description
Broker partition skewed	An alarm is triggered when the distribution of partition counts between the broker with the most partitions and the broker with the least partitions meets the metric detail settings
Leader partition skewed	An alarm is triggered when the distribution of leader partition counts between the broker with the most leader partitions and the broker with the least leader partitions meets the metric detail settings
Number of the offline-partitions	An alarm is triggered when the number of offline partitions meets the metric detail settings
Number of the partitions in a broker	An alarm is triggered when the total number of partitions within a broker meets the metric detail settings
Number of the partitions in a cluster	An alarm is triggered when the total number of partitions meets the metric detail settings
Number of the under-min-ISR partitions	An alarm is triggered when the number of partitions that don't meet the minimum number of replicas required for the ISR (In-Sync Replicas - follower replicas synchronized with the leader partition) meets the metric detail settings
Number of the under-replicated partitions	An alarm is triggered when the number of unreplicated partitions meets the metric detail settings

Node Metrics

Metric	Description
CPU usage	An alarm is triggered when CPU usage (%) meets the metric detail settings
Node disk usage	An alarm is triggered when disk usage (%) on the node meets the metric detail settings
Node DOWN	An alarm is triggered when all nodes are down
Memory usage	An alarm is triggered when memory usage (%) meets the metric detail settings
Node disk usage per mount point	An alarm is triggered when disk usage (%) per mount point on the node meets the metric detail settings

ZooKeeper Metrics

Metric	Description
ZooKeeper connection status warning	An alarm is triggered when the connection status between the broker and the ZooKeeper is interrupted
ZooKeeper instance DOWN	An alarm is triggered when all instances (servers) of the registered ZooKeeper are down

Schema Registry Metrics

Metric	Description
Schema Registry instance DOWN	An alarm is triggered when all instances (servers) of the registered Schema Registry Cluster are down

Consumer Group Metrics

Metric	Description
Consumer group lag	An alarm is triggered when the size of the lag (difference between the offset of the data the producer put in and the data the consumer took) in the consumer group meets the metric detail settings
Consumer group status is not STABLE	An alarm is triggered when one or more partitions within a consumer group are detected to be in a delayed, paused, or rewound state. In these cases, the consumer group's status is deemed abnormal
Number of consumer instances in a consumer group	An alarm is triggered when the number of consumer instances in the consumer group meets the metric detail settings

Topic Metrics

Metric	Description
Topic message-in/sec	An alarm is triggered when the number of messages produced per second for the topic meets the metric detail settings
Topic byte-in/sec	An alarm is triggered when the size (in Bytes) of messages produced per second for the topic meets the metric detail settings
Topic byte-out/sec	An alarm is triggered when the size (in Bytes) of messages consumed per second from the topic meets the metric detail settings
Increasement of topic message-in in the last T minutes	An alarm is triggered when the increase in the number of messages produced for the topic in the last T minutes meets the metric detail settings
Increasement of topic byte-in in the last T minutes	An alarm is triggered when the increase in the size (in Bytes) of messages produced for the topic in the last T minutes meets the metric detail settings
Increasement of topic byte-out in the last T minutes	An alarm is triggered when the increase in the size (in Bytes) of messages consumed from the topic in the last T minutes meets the metric detail settings
Increasement of topic message-in in the last T hours	An alarm is triggered when the increase in the number of messages produced for the topic in the last T hours meets the metric detail settings
Increasement of topic byte-in in the last T hours	An alarm is triggered when the increase in the size (in Bytes) of messages produced for the topic in the last T hours meets the metric detail settings
Increasement of topic byte-out in the last T hours	An alarm is triggered when the increase in the size (in Bytes) of messages consumed from the topic in the last T hours meets the metric detail settings

Connect Metrics

Metric	Description
Connect instance DOWN	An alarm is triggered when all instances (servers) of the registered Connect Cluster are down

CMPS Metrics

Metric	Description
Cluster Message Consumption Per Second	Alarm triggered when cluster message consumption per second meets the metric detailed settings
Cluster Message Consumption Per Second (Consumer Group Level)	Alarm triggered when cluster message consumption per second per consumer group meets the metric detailed settings
Cluster Message Consumption Per Second (Consumer Group - Topic Level)	Alarm triggered when cluster message consumption per second per consumer group - topic meets the metric detailed settings

Connector Metrics

Metric	Description
Task status abnormal (failed)	An alarm is triggered when the task status changes to failed
Messages polled per second (poll) [Source Connector]	An alarm is triggered when the number of messages polled per second by the source connector meets the metric details settings
Messages written per second (write) [Source Connector]	An alarm is triggered when the number of messages written per second by the source connector meets the metric details settings
Messages read per second (read) [Sink Connector]	An alarm is triggered when the number of messages read per second by the sink connector meets the metric details settings
Messages sent per second (send) [Sink Connector]	An alarm is triggered when the number of messages sent per second by the sink connector meets the metric details settings
Number of records failed by connector	An alarm is triggered when the number of records the connector failed to process meets the metric details settings
Number of write failures to DLT (Dead Letter Topic)	Indicates the number of attempts to write to the DLT that failed for records the connector failed to process. An alarm is triggered when the number meets the metric details settings

Data Mirroring Metrics

Metric	Description
Messages processed per second by topic (bytes)	An alarm is triggered when the number of bytes of messages replicated per second by topic meets the metric details settings
Mirroring job lag	An alarm is triggered when the mirroring job lag meets the metric details settings

Broker Metrics
Kafka Network Metrics
Partition Metrics
Node Metrics
ZooKeeper Metrics
Schema Registry Metrics
Consumer Group Metrics
Topic Metrics
Connect Metrics
CMPS Metrics
Connector Metrics
Data Mirroring Metrics