前言
使用Flume实时收集日志的过程中,尽管有事务机制保证数据不丢失,但仍然需要时刻关注Source、Channel、Sink之间的消息传输是否正常,比如,SouceàChannel传输了多少消息,ChannelàSink又传输了多少,两处的消息量是否偏差过大等等。
一、flume监控原理
Flume为我们提供了Monitor的机制:http://flume.apache.org/FlumeUserGuide.html#monitoring 通过Reporting的方式,把过程中的Counter都打印出来。一共有4种Reporting方式,JMX Reporting、Ganglia Reporting、JSON Reporting、Custom Reporting, 这里以最简单的JSON Reporting为例。
在启动Flume Agent时候,增加两个参数:
flume-ng agent -n agent_lxw1234 –conf . -f kafka-flume-hdfs.conf -Dflume.monitoring.type=http -Dflume.monitoring.port=34545
flume.monitoring.type=http 指定了Reporting的方式为http,flume.monitoring.port 指定了http服务的端口号。
启动后,会在Flume Agent所在的机器上启动http服务,http://<hostname>:34545/metrics 打开该地址后,返回一段JSON:
{ "SINK.k1": { "ConnectionCreatedCount": "2", "BatchCompleteCount": "12", "EventWriteFail": "0", "BatchEmptyCount": "24", "EventDrainAttemptCount": "1456", "StartTime": "1625050876306", "BatchUnderflowCount": "4", "ChannelReadFail": "0", "ConnectionFailedCount": "0", "ConnectionClosedCount": "2", "Type": "SINK", "EventDrainSuccessCount": "1456", "StopTime": "0" }, "CHANNEL.c1": { "Unhealthy": "0", "ChannelSize": "0", "EventTakeAttemptCount": "1484", "StartTime": "1625050875820", "Open": "true", "CheckpointWriteErrorCount": "0", "ChannelCapacity": "1000000", "ChannelFillPercentage": "0.0", "EventTakeErrorCount": "0", "Type": "CHANNEL", "EventTakeSuccessCount": "1456", "Closed": "0", "CheckpointBackupWriteErrorCount": "0", "EventPutAttemptCount": "1456", "EventPutSuccessCount": "1456", "EventPutErrorCount": "0", "StopTime": "0" }, "SOURCE.r1": { "KafkaEventGetTimer": "10004", "AppendBatchAcceptedCount": "0", "EventAcceptedCount": "1456", "AppendReceivedCount": "0", "StartTime": "1625050876407", "AppendBatchReceivedCount": "0", "KafkaCommitTimer": "32", "EventReceivedCount": "1456", "Type": "SOURCE", "KafkaEmptyCount": "19", "AppendAcceptedCount": "0", "OpenConnectionCount": "0", "StopTime": "0" } }
对source "EventReceivedCount": "1456","EventAcceptedCount": "1456",
说明读source没问题
"EventPutSuccessCount": "1456" channel也没问题
"EventDrainSuccessCount": "1456" sink的数据记录数。
source,channel,sink的数据条数是一致的,说明没问题
二、演示flume监控
生成数据
生成日志774行
[hadoop@hadoop101 log]$ cat app.2021-06-30.log |wc
774 2889 402277
三、启动flume
1. flume 通过kafkaChannel采集到kafka,但kafka还没启动
查看flume监控信息:
http://hadoop101:34545/metrics 在浏览器查看,百度个 json美化工具,在线美化一下
{
"CHANNEL.c1": {
"KafkaEventGetTimer": "0",
"ChannelSize": "0",
"EventTakeAttemptCount": "0",
"StartTime": "1625032314583",
"ChannelCapacity": "0",
"ChannelFillPercentage": "1.7976931348623157E308",
"KafkaCommitTimer": "0",
"Type": "CHANNEL",
"EventTakeSuccessCount": "0",
"RollbackCount": "0",
"KafkaEventSendTimer": "0",
"EventPutSuccessCount": "0",
"EventPutAttemptCount": "100",
"StopTime": "0"
},
"": {
"AppendBatchAcceptedCount": "0",
"GenericProcessingFail": "0",
"EventAcceptedCount": "0",
"AppendReceivedCount": "0",
"SOURCE.r1StartTime": "1625032314917",
"AppendBatchReceivedCount": "1",
"ChannelWriteFail": "0",
"EventReceivedCount": "100",
"EventReadFail": "0",
"Type": "SOURCE",
"AppendAcceptedCount": "0",
"OpenConnectionCount": "0",
"StopTime": "0"
}
}
source的日志: "EventReceivedCount": "100", "EventAcceptedCount": "0"
kafka channel的日志: "EventPutSuccessCount": "0",
说明flume采集到数据, 总数据有774条,只采集到100条,但没有发送给kafka
tailDirSource 参数的说明:
batchSize | 100 | Max number of lines to read and send to the channel at a time. Using the default is usually fine. |
2.把kafka打开,重新看看监控信息
http://hadoop101:34545/metrics
{
"CHANNEL.c1": {
"KafkaEventGetTimer": "0",
"ChannelSize": "0",
"EventTakeAttemptCount": "0",
"StartTime": "1625032314583",
"ChannelCapacity": "0",
"ChannelFillPercentage": "1.7976931348623157E308",
"KafkaCommitTimer": "0",
"Type": "CHANNEL",
"EventTakeSuccessCount": "0",
"RollbackCount": "0",
"KafkaEventSendTimer": "268",
"EventPutSuccessCount": "774",
"EventPutAttemptCount": "1074",
"StopTime": "0"
},
"SOURCE.r1": {
"AppendBatchAcceptedCount": "8",
"GenericProcessingFail": "0",
"EventAcceptedCount": "774",
"AppendReceivedCount": "0",
"StartTime": "1625032314917",
"AppendBatchReceivedCount": "11",
"ChannelWriteFail": "3",
"EventReceivedCount": "1074",
"EventReadFail": "0",
"Type": "SOURCE",
"AppendAcceptedCount": "0",
"OpenConnectionCount": "0",
"StopTime": "0"
}
}
EventAcceptedCount=774 ,EventPutSuccessCount=774
说明采集到发送都是正常的
总结
把flume的监听端口打开,从而可以通过http请求获取flume的监控数据