flume第三讲taildir source(reliable很可靠)
tail(追踪)不会丢失数据,但是在极端的情况下可能会产生重复采集数据
工作机制
可以动态的采集文件夹下的大量的文件
纪录偏移量offset到指定positionFile保存目录中,格式为json.
重要的参数详解:
fileguoups 空格分割的组名,每个组代表着一批文件,g1,g2.
fileguoups 每个文件组的绝对路径
positionFile 纪录偏移量的文件所在路径,如果不设置的话默认使用 ~/flume/taildir_position,json路径
在虚拟就新建一个配置文件
[root@doit02 agent]# vi tailDir-m-logger.conf
[root@doit02 agent]# vi tailDir-m-logger.conf
a1.sources = s1
a1.channels = c1
a1.sinks = k1
#设置这个source来源的种类
a1.sources.s1.type = TAILDIR
a1.sources.s1.channels = c1
#设置读取偏移量要储存的路径
a1.sources.s1.positionFile = /root/offset/position.json
#空格分割的组名,每一组代表一批文件
a1.sources.s1.filegroups = g1
#采集logs22文件下的所有的以.log结尾的文件
a1.sources.s1.filegroups.g1 = /logs22/.*log
#将我们的头文件也读取过来
a1.sources.s1.fileHeader = true
#设置头文件key的名称
a1.sources.s1.fileHeaderKey = filepath
#设置批次处理,200条处理一次
a1.sources.s1.batchSize = 200
a1.channels.c1.type = memory
a1.channels.c1.capacity = 300
#设置200条做一个事物
a1.channels.c1.transactionCapacity = 200
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
flume执行这个文件
[root@doit02 flume-1.9.0-bin]# bin/flume-ng agent -n a1 -c conf -f \
agent/tailDir-m-logger.conf -Dflume.root.logger=INFO,console
我们往被监听的路径文件追加数据
[root@doit02 agent]# echo "pengche===ngccccccccc" >> c.log
[root@doit02 agent]# echo "pengche===ngccccccccc" >> c.log
[root@doit02 agent]# echo "pengche===ngccccccccc" >> c.log
执行结果
2020-04-21 22:55:21,406 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{filepath=/logs22/a.log} body: 68 61 68 61 68 61 68 61 68 61 68 61 hahahahahaha }
2020-04-21 22:55:24,399 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{filepath=/logs22/a.log} body: 68 61 68 61 68 61 68 61 68 61 68 61 hahahahahaha }
2020-04-21 22:55:29,400 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{filepath=/logs22/a.log} body: 68 61 68 61 68 61 68 61 68 61 68 61 hahahahahaha }
2020-04-21 22:55:59,406 (PollableSourceRunner-TaildirSource-s1) [INFO - org.apache.flume.source.taildir.TaildirSource.closeTailFiles(TaildirSource.java:307)] Closed file: /logs22/c.log, inode: 912955, pos: 19
2020-04-21 22:55:59,406 (PollableSourceRunner-TaildirSource-s1) [INFO - org.apache.flume.source.taildir.TaildirSource.closeTailFiles(TaildirSource.java:307)] Closed file: /logs22/b.log, inode: 912947,
偏移量记录
[root@doit02 offset]# pwd
/root/offset
[root@doit02 offset]# cat position.json
[{"inode":913089,"pos":287,"file":"/logs22/a.log"},
{"inode":912947,"pos":36,"file":"/logs22/b.log"},
{"inode":912955,"pos":19,"file":"/logs22/c.log"}]
偏移量解析
inode:字节,POS:发生偏移的位置,file:被监听的路径下的文件
我们生产中常用的就是将POS的值改为0,我们重新启动的时候就回从0开始采集文件了