flume将指定目录下文件解析后入到Hbase(开启kerberos)

  • 运用 Spooling Directory Source 可以实现将将要收集的数据放置到”自动搜集”目录中。这个Source将监视该目录,实时解析新文件。事件处理逻辑是可插拔的,当一个文件被完全读入通道,它会被重命名或可选的直接删除。本例为重命名。

  • 要注意的是,放置到自动搜集目录下的文件不能修改,如果修改,则flume会报错。另外,也不能产生重名的文件,如果有重名的文件被放置进来,则flume会报错。
    属性说明:(由于比较长 这里只给出了必须给出的属性,全部属性请参考官方文档):
    !type – 类型,需要指定为”spooldir”
    !spoolDir – 读取文件的路径,即”搜集目录”
    fileSuffix.COMPLETED对处理完成的文件追加的后缀

  • 案例:
    1、创建对应的hbase 表

hbase(main):017:0> create 'test01','cf_log'

2、编写配置文件

a1.sources = r1
a1.channels = c1 
a1.sinks = k1

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /opt/data
a1.sources.r1.batchSize = 1
a1.sources.r1.channels = c1 

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000

a1.sinks.k1.batchSize = 1
a1.sinks.k1.type = hbase
a1.sinks.k1.table = test01
a1.sinks.k1.columnFamily = cf_log
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer

a1.sinks.k1.serializer.rowKeyIndex = 0
a1.sinks.k1.serializer.regex=(\\d+-\\d\\d-\\d\\d\\s\\d+:\\d+:\\d+)\\s.*cmd\\\":\\\"(\\d+)\\\".*\\\"phoneNum\\\":\\\"(\\d{0,13})\\\",\\\"username\\\":\\\"([a-z0-9_-]{3,16})\\\",\\\"name\\\":\\\"(.*)\\\",\\\"wor
kOrgName\\\":\\\"(.*)\\\",\\\"workOrg\\\":\\\"(\\d+).*$
a1.sinks.k1.serializer.colNames=ROW_KEY,cmd,phone,userName,name,workOrgName,workOrg


a1.sinks.k1.channel = c1
a1.sinks.k1.kerberosPrincipal = hbase/hdp29@BIGDATA.COM
a1.sinks.k1.kerberosKeytab = /etc/security/keytabs/hbase.service.keytab 

启动flume 样例

flume-ng  agent -n a1  -c ../conf  -f test01.conf   -Dflume.root.logger=DEBUG,console

启动成功后,会将配置文件中对应的目录下所有文件解析到hbase中,解析过的文件会被重命名,当有新文件添加进该目录,会被自动进行解析。

源文件:

2017-08-18 10:16:10 [http-bio-7080-exec-16] DEBUG  - <==请求报文:{"cmd":"101005","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:17:55 [http-bio-7080-exec-16] DEBUG  - <==请求报文:{"cmd":"104009","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:18:18 [http-bio-7080-exec-16] DEBUG  - <==请求报文:{"cmd":"103002","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:19:17 [http-bio-7080-exec-16] DEBUG  - <==请求报文:{"cmd":"105005","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:20:30 [http-bio-7080-exec-16] DEBUG  - <==请求报文:{"cmd":"103006","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)
2017-08-18 10:23:40 [http-bio-7080-exec-16] DEBUG  - <==请求报文:{"cmd":"107009","common":{"user":{"phoneNum":"88888888","username":"admin","name":"admin","workOrgName":"广州市市民服务和社会保障卡管理中心","wo
rkOrg":"0000001"}},"params":{"batchType":"1"}} (WebServiceImpl.java:53)

....

解析后Hbase中结果查询:

2017-08-18 10:20:30                                  column=cf_log:cmd, timestamp=1503890226394, value=103006                                                                                                   
 2017-08-18 10:20:30                                  column=cf_log:name, timestamp=1503890226394, value=admin                                                                                                   
 2017-08-18 10:20:30                                  column=cf_log:phone, timestamp=1503890226394, value=88888888                                                                                               
 2017-08-18 10:20:30                                  column=cf_log:userName, timestamp=1503890226394, value=admin                                                                                               
 2017-08-18 10:20:30                                  column=cf_log:workOrg, timestamp=1503890226394, value=0000001                                                                                              
 2017-08-18 10:20:30                                  column=cf_log:workOrgName, timestamp=1503890226394, value=\xE5\xB9\xBF\xE5\xB7\x9E\xE5\xB8\x82\xE5\xB8\x82\xE6\xB0\x91\xE6\x9C\x8D\xE5\x8A\xA1\xE5\x92\x8C\
                                                      xE7\xA4\xBE\xE4\xBC\x9A\xE4\xBF\x9D\xE9\x9A\x9C\xE5\x8D\xA1\xE7\xAE\xA1\xE7\x90\x86\xE4\xB8\xAD\xE5\xBF\x83                                                
 2017-08-18 10:23:40                                  column=cf_log:cmd, timestamp=1503890226396, value=107009                                                                                                   
 2017-08-18 10:23:40                                  column=cf_log:name, timestamp=1503890226396, value=admin                                                                                                   
 2017-08-18 10:23:40                                  column=cf_log:phone, timestamp=1503890226396, value=88888888                                                                                               
 2017-08-18 10:23:40                                  column=cf_log:userName, timestamp=1503890226396, value=admin                                                                                               
 2017-08-18 10:23:40                                  column=cf_log:workOrg, timestamp=1503890226396, value=0000001                                                                                              
 2017-08-18 10:23:40                                  column=cf_log:workOrgName, timestamp=1503890226396, value=\xE5\xB9\xBF\xE5\xB7\x9E\xE5\xB8\x82\xE5\xB8\x82\xE6\xB0\x91\xE6\x9C\x8D\xE5\x8A\xA1\xE5\x92\x8C\
                                                      xE7\xA4\xBE\xE4\xBC\x9A\xE4\xBF\x9D\xE9\x9A\x9C\xE5\x8D\xA1\xE7\xAE\xA1\xE7\x90\x86\xE4\xB8\xAD\xE5\xBF\x83 
......
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值