Flume收集DataNode日志到HDFS之上
一、查看DN日志内容(已经将日志格式改造成了JSON),具体改造方法参考DN日志改造输出为JSON
{"time":"2018-01-16 12:07:10,846","logtype":"INFO","loginfo":"org.apache.hadoop.hdfs.server.datanode.DataNode:PacketResponder: BP-1517073770-172.16.15.80-1508233672475:blk_1074112639_371833, type=HAS_DOWNSTREAM_IN_PIPELINE terminating"}
{"time":"2018-01-16 12:07:18,106","logtype":"INFO","loginfo":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:Scheduling blk_1074112639_371833 file /data/1/dn/current/BP-1517073770-172.16.15.80-1508233672475/current/finalized/subdir5/subdir168/blk_1074112639 for deletion"}
{"time":"2018-01-16 12:07:18,106","logtype":"INFO","loginfo":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:Deleted BP-1517073770-172.16.15.80-1508233672475 blk_1074112639_371833 file /data/1/dn/current/BP-1517073770-172.16.15.80-1508233672475/current/finalized/subdir5/subdir168/blk_1074112639"}
字段:日志时间、日志级别、日志内容
二、Flume配置如下
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#com.onlinelog.analysis.ExecSource_JSON是经改造后的类,可以收集到机器名和服务器,当我们查看收集上来的数据后就可以知道这条日志来自于那台机器哪个服务的
a1.sources.r1.type = com.onlinelog.analysis.ExecSource_JSON
a1.sources.r1.command = tail -F /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-hadoop001.log.out
a1.sources.r1.hostname = hadoop001
a1.sources.r1.servicename = DataNode
a1.sources.r1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://nameservice1:8020/data/flume/exec/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix=event-
a1.sinks.k1.hdfs.batchSize = 10
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.channel = c1
启动flume
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-memory-hdfs.conf \
-Dflume.root.logger=INFO,console
三、查看HDFS目录日志
[root@hadoop001 hadoop-hdfs]# hadoop fs -text /data/flume/exec/18-01-16/1134/47/*
18/01/16 12:11:30 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
18/01/16 12:11:30 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 674c65bbf0f779edc3e00a00c953b121f1988fe1]
{"hostname":"hadoop001","servicename":"DataNode","time":"2018-01-16 11:33:09,020","logtype":"INFO","loginfo":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:Deleted BP-1517073770-172.16.15.80-1508233672475 blk_1074112132_371326 file /data/1/dn/current/BP-1517073770-172.16.15.80-1508233672475/current/finalized/subdir5/subdir166/blk_1074112132"}
可以看到看到日志内容为:机器名,服务名、日志时间、日志级别、日志内容