运用 Spooling Directory Source 可以实现将将要收集的数据放置到”自动搜集”目录中。这个Source将监视该目录,实时解析新文件。事件处理逻辑是可插拔的,当一个文件被完全读入通道,它会被重命名或可选的直接删除。本例为重命名。
要注意的是,放置到自动搜集目录下的文件不能修改,如果修改,则flume会报错。另外,也不能产生重名的文件,如果有重名的文件被放置进来,则flume会报错。
属性说明:(由于比较长 这里只给出了必须给出的属性,全部属性请参考官方文档):
!type – 类型,需要指定为”spooldir”
!spoolDir – 读取文件的路径,即”搜集目录”
fileSuffix.COMPLETED对处理完成的文件追加的后缀案例:
1、创建对应的hbase 表
hbase(main):017:0> create 'test01','cf_log'
2、编写配置文件
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /opt/data
a1.sources.r1.batchSize = 1
a1.sources.r1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000
a1.sinks.k1.batchSize = 1
a1.sinks.k1.type = hbase
a1.sinks.k1.table = test01
a1.sinks.k1.columnFamily = cf_log
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.serializer.rowKeyIndex = 0
a1.sinks.k1.serializer.regex=(\\d+-\\d\\d-\\d\\d\\s\\d+:\\d+:\\d+)\\s.*cmd\\\":\\\"(\\d+)\\\".*\\\"phoneNum\\\":\\\"(\\d{0,13})\\\",\\\"username\\\":\\\"([a-z0-9_-]{3,16})\\\",\\\"name\\\":\\\"(.*)\\\",\\\"wor
kOrgName\\\":\\\"(.*)\\\",\\\"workOrg\\\":\\\"(\\d+).*$
a1.sinks.k1.serializer.colNames=ROW_KEY,cmd,phone,userName,name,workOrgName,workOrg
a1.sinks.k1.channel = c1
a1.sinks.k1.kerberosPrincipal = hbase/hdp29@BIGDATA.COM
a1.sinks.k1.kerberosKeytab = /etc/security/keytabs/hbase.service.keytab
启动flume 样例
flume-ng agent -n a1 -c ../conf -f test01.conf -Dflume.root.logger=DEBUG,console
启动成功后,会将配置文件中对应的目录下所有文件解析到hbase中,解析过的文件会被重命名,当有新文件添加进该目录,会被自动进行解析。
源文件:
2017-08-18 10:16:10 [http-bio-7080-exec-16] DEBUG - <==请求报文:{"cmd":"101005","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:17:55 [http-bio-7080-exec-16] DEBUG - <==请求报文:{"cmd":"104009","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:18:18 [http-bio-7080-exec-16] DEBUG - <==请求报文:{"cmd":"103002","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:19:17 [http-bio-7080-exec-16] DEBUG - <==请求报文:{"cmd":"105005","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:20:30 [http-bio-7080-exec-16] DEBUG - <==请求报文:{"cmd":"103006","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:23:40 [http-bio-7080-exec-16] DEBUG - <==请求报文:{"cmd":"107009","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
....
解析后Hbase中结果查询:
2017-08-18 10:20:30 column=cf_log:cmd, timestamp=1503890226394, value=103006
2017-08-18 10:20:30 column=cf_log:name, timestamp=1503890226394, value=admin
2017-08-18 10:20:30 column=cf_log:phone, timestamp=1503890226394, value=88888888
2017-08-18 10:20:30 column=cf_log:userName, timestamp=1503890226394, value=admin
2017-08-18 10:20:30 column=cf_log:workOrg, timestamp=1503890226394, value=0000001
2017-08-18 10:20:30 column=cf_log:workOrgName, timestamp=1503890226394, value=\xE5\xB9\xBF\xE5\xB7\x9E\xE5\xB8\x82\xE5\xB8\x82\xE6\xB0\x91\xE6\x9C\x8D\xE5\x8A\xA1\xE5\x92\x8C\
xE7\xA4\xBE\xE4\xBC\x9A\xE4\xBF\x9D\xE9\x9A\x9C\xE5\x8D\xA1\xE7\xAE\xA1\xE7\x90\x86\xE4\xB8\xAD\xE5\xBF\x83
2017-08-18 10:23:40 column=cf_log:cmd, timestamp=1503890226396, value=107009
2017-08-18 10:23:40 column=cf_log:name, timestamp=1503890226396, value=admin
2017-08-18 10:23:40 column=cf_log:phone, timestamp=1503890226396, value=88888888
2017-08-18 10:23:40 column=cf_log:userName, timestamp=1503890226396, value=admin
2017-08-18 10:23:40 column=cf_log:workOrg, timestamp=1503890226396, value=0000001
2017-08-18 10:23:40 column=cf_log:workOrgName, timestamp=1503890226396, value=\xE5\xB9\xBF\xE5\xB7\x9E\xE5\xB8\x82\xE5\xB8\x82\xE6\xB0\x91\xE6\x9C\x8D\xE5\x8A\xA1\xE5\x92\x8C\
xE7\xA4\xBE\xE4\xBC\x9A\xE4\xBF\x9D\xE9\x9A\x9C\xE5\x8D\xA1\xE7\xAE\xA1\xE7\x90\x86\xE4\xB8\xAD\xE5\xBF\x83
......