flume拦截器

拦截器作用:拦截器是简单的插件式组件,设置在source和channel之间。source接收到的事件,在写入channel之前,拦截器都可以进行转换或者删除这些事件。每个拦截器只处理同一个source接收到的事件。可以自定义拦截器。

先解释一下一个重要对象Event:event是flume传输的最小对象,从source获取数据后会先封装成event,然后将event发送到channel,sink从channel拿event消费。event由头(Map(String, String) headers)和身体(body)两部分组成:Headers部分是一个map,body部分可以是String或者byte[]等。其中body部分是真正存放数据的地方

Timestamp Interceptor:当前时间拦截器
Host Interceptor:主机名拦截器
Static Interceptor:可以在event的header中添加自定义的key和value。
Regex Filtering Interceptor:通过正则来清洗或包含匹配的events。
Regex Extractor Interceptor:通过正则表达式来在header中添加指定的key,value则为正则匹配的部分

Timestamp Interceptor

--往event中添加数据,常用于文件命名

这里写图片描述

flume配置

a1.sources = r1  
a1.sinks = k1  
a1.channels = c1  

# Describe/configure the source  
a1.sources.r1.type = syslogtcp  
a1.sources.r1.port = 50000  
a1.sources.r1.host = 192.168.109.112  
a1.sources.r1.channels = c1     
a1.sources.r1.interceptors = i1  
a1.sources.r1.interceptors.i1.preserveExisting= false
a1.sources.r1.interceptors.i1.type = timestamp      --只有设置这个,下面%Y才能用

# Describe the sink  
a1.sinks.k1.type = hdfs  
a1.sinks.k1.channel = c1  
a1.sinks.k1.hdfs.path =hdfs://carl:9000/flume/%Y-%m-%d/%H%M  
a1.sinks.k1.hdfs.filePrefix = looklook5.  
a1.sinks.k1.hdfs.fileType=DataStream  

# Use a channel which buffers events inmemory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  

发送数据:
echo "TimestampInterceptor" | nc 192.168.109.112 50000

文件路径:
cat /flume/2017-10-16/1542/looklook5.flumedata.1714568549601

Host Interceptor : IP拦截

这里写图片描述

# Name the components on this agent  
a1.sources = r1  
a1.sinks = k1  
a1.channels = c1  

# Describe/configure the source  
a1.sources.r1.type = syslogtcp  
a1.sources.r1.port = 50000  
a1.sources.r1.host = 192.168.233.128  
a1.sources.r1.channels = c1  

a1.sources.r1.interceptors = i1 i2  
a1.sources.r1.interceptors.i1.preserveExisting= false  
a1.sources.r1.interceptors.i1.type =timestamp  
a1.sources.r1.interceptors.i2.type = host          ---ip拦截
a1.sources.r1.interceptors.i2.hostHeader =hostname  
a1.sources.r1.interceptors.i2.useIP = false   -- true使用IP,false使用hostHeader=hostname    

# Describe the sink  
a1.sinks.k1.type = hdfs  
a1.sinks.k1.channel = c1  
a1.sinks.k1.hdfs.path =hdfs://carl:9000/flume/%Y-%m-%d/%H%M  
a1.sinks.k1.hdfs.filePrefix = %{hostname}   --拦截
a1.sinks.k1.hdfs.fileType=DataStream  

# Use a channel which buffers events inmemory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  

发送数据

echo "Time&hostInterceptor1" | nc 192.168.233.128 50000
echo "Time&hostInterceptor2" | nc 192.168.233.128 50000

生成文件:car1为hostname

 cat car1.flumedata.1414568549601

Static Interceptor:没有实用价值,pass

Regex FilteringInterceptor:正则过滤器

这里写图片描述

我们对开头字母是数字的数据,全部过滤。

a1.sources = r1  
a1.sinks = k1  
a1.channels = c1  

# Describe/configure the source  
a1.sources.r1.type = syslogtcp  
a1.sources.r1.port = 50000  
a1.sources.r1.host = 192.168.233.128  
a1.sources.r1.channels = c1  
a1.sources.r1.interceptors = i1  
a1.sources.r1.interceptors.i1.type =regex_filter  
a1.sources.r1.interceptors.i1.regex =^[0-9]*$  
a1.sources.r1.interceptors.i1.excludeEvents =true      --什么意思?  
                          1. 默认是false,  如果是true则删除匹配到的event,false,反之

# Describe the sink  
a1.sinks.k1.type = logger  

# Use a channel which buffers events inmemory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  

# Bind the source and sink to the channel  
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1  

发送数据

echo "a" | nc192.168.233.128 50000
echo "1222" |nc 192.168.233.128 50000
echo "a222" |nc 192.168.233.128 50000

只有第二条被过滤了

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值