Monitoring
Kafka uses Yammer Metrics for metrics reporting in both the server and the client. This can be configured to report stats using pluggable stats reporters to hook up to your monitoring system.
Kafka使用Yammer Metrics(度量,也可称为指标)(在服务器和客户端之间的指标报告)。可以配置使用可插拔的记录统计连接到你的监控系统。
The easiest way to see the available metrics to fire up jconsole and point it at a running kafka client or server; this will all browsing all metrics with JMX.
最简单的方式是通过查看可用的指标来激活jconsole并将其指向正在运行的kafka客户端或服务器(将使用JMX游览所有的指标);
We pay particular we do graphing and alerting on the following metrics:
我们特别支持对以下指标进行图形化和警报:
Description | Mbean name | Normal value |
---|---|---|
Message in rate 消息比率 |
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec | |
Byte in rate 字节比率 |
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec | |
Request rate 请求比率 |
kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower} | |
Byte out rate 字节输出比率 |
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec | |
Log flush rate and time 日志冲洗比率和时间 |
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs | |
# of under replicated partitions (|ISR| < |all replicas|) 关于副本分区 (|ISR| < |all replicas|) |
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions | 0 |
Is controller active on broker 在broker上控制活跃 |
kafka.controller:type=KafkaController,name=ActiveControllerCount | only one broker in the cluster should have 1 急群中仅1个应该有1 |
Leader election rate leader选举比率 |
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs | non-zero when there are broker failures 非零,当broker失败 |
Unclean leader election rate Unclean leader |
kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec | 0 |
Partition counts 分区总数 |
kafka.server:type=ReplicaManager,name=PartitionCount | mostly even across brokers 大部分甚至跨broker |
Leader replica counts leader副本数 |
kafka.server:type=ReplicaManager,name=LeaderCount | mostly even across brokers 大部分甚至跨broker |
ISR shrink rate ISR收缩比率 |
kafka.server:type=ReplicaManager,name=IsrShrinksPerSec | If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0. |
ISR expansion rate ISR膨胀比率 |
kafka.server:type=ReplicaManager,name=IsrExpandsPerSec | See above |
Max lag in messages btw follower and leader replicas 跟随者和leader副本的最大消息落后 |
kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica | < replica.lag.max.messages |
Lag in messages per follower replica 每个跟随者副本的消息落后 |
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+) | < replica.lag.max.messages |
Requests waiting in the producer purgatory 生产者purgatory请求告警 |
kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize | non-zero if ack=-1 is used 非零,如果ack=-1 |
Requests waiting in the fetch purgatory 拉取purgatory的请求告警 |
kafka.server:type=FetchRequestPurgatory,name=PurgatorySize | size depends on fetch.wait.max.ms in the consumer |
Request total time 请求总时间 |
kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower} | broken into queue, local, remote and response send time 分成队列,本地,远程和响应发送时间 |
Time the request waiting in the request queue 在请求队列中等待请求的时间 |
kafka.network:type=RequestMetrics,name=QueueTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Time the request being processed at the leader leader处理请求的时间 |
kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Time the request waits for the follower 跟随者请求等待的时间 |
kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower} | non-zero for produce requests when ack=-1 当ack=-1,生产请求非零 non-zero for produce requests when ack=-1 |
Time to send the response 响应发送的时间 |
kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Number of messages the consumer lags behind the producer by 消息数,消费者落后于消生产者 |
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+) | |
The average fraction of time the network processors are idle 网络处理闲置的平均分数 |
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent | between 0 and 1, ideally > 0.3 0和1之间,理想地 > 0.3 |
The average fraction of time the request handler threads are idle 请求处理线程闲置的平均分数 |
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent | between 0 and 1, ideally > 0.3 0和1之间,理想地 > 0.3 |
Common monitoring metrics for producer/consumer/connect
生产者/消费者/连接的共同监控指标
The following metrics are available on producer/consumer/connector instances. For specific metrics, please see following sections.
以下指标可用于生产者/消费者/连接器实例。有关具体的指标。请查看以下部分。
METRIC/ATTRIBUTE NAME |
DESCRIPTION |
MBEAN NAME |
connection-close-rate |
Connections closed per second in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
connection-creation-rate |
New connections established per second in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
network-io-rate |
The average number of network operations (reads or writes) on all connections per second. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
outgoing-byte-rate |
The average number of outgoing bytes sent per second to all servers. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
request-rate |
The average number of requests sent per second. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
request-size-avg |
The average size of all requests in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
request-size-max |
The maximum size of any request sent in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
incoming-byte-rate |
Bytes/second read off all sockets. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
response-rate |
Responses received sent per second. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
select-rate |
Number of times the I/O layer checked for new I/O to perform per second. |