Flume拦截器(含自定义拦截器)

——————— —————————— —————————— —————————— —————————— ————————

一、Flume拦截器

1.1 时间戳拦截器

(1)Timestamp.conf

#1.定义agent名, source、channel、sink的名称
a4.sources = r1
a4.channels = c1
a4.sinks = k1
 
#2.具体定义source
a4.sources.r1.type = spooldir
a4.sources.r1.spoolDir = /opt/module/flume-1.8.0/upload

#定义拦截器,为文件最后添加时间戳
a4.sources.r1.interceptors = timestamp
a4.sources.r1.interceptors.timestamp.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

 

#具体定义channel
a4.channels.c1.type = memory
a4.channels.c1.capacity = 10000
a4.channels.c1.transactionCapacity = 100

#具体定义sink
a4.sinks.k1.type = hdfs
a4.sinks.k1.hdfs.path = hdfs://bigdata111:9000/flume-interceptors/%H
a4.sinks.k1.hdfs.filePrefix = events-
a4.sinks.k1.hdfs.fileType = DataStream

#不按照条数生成文件
a4.sinks.k1.hdfs.rollCount = 0

#HDFS上的文件达到128M时生成一个文件
a4.sinks.k1.hdfs.rollSize = 134217728

#HDFS上的文件达到60秒生成一个文件
a4.sinks.k1.hdfs.rollInterval = 60

#组装source、channel、sink
a4.sources.r1.channels = c1
a4.sinks.k1.channel = c1

(2)启动命令:

/opt/module/flume-1.8.0/bin/flume-ng agent -n a4 \
-f /opt/module/flume-1.8.0/jobconf/flume-interceptors.conf \
-c /opt/module/flume-1.8.0/conf \
-Dflume.root.logger=INFO,console

1.2 主机名拦截器

(1)Host.conf

#1.定义agent
a1.sources= r1
a1.sinks = k1
a1.channels = c1

#2.定义source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /opt/plus

#拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = host

#参数为true时用IP192.168.1.111,参数为false时用主机名,默认为true
a1.sources.r1.interceptors.i1.useIP = false
a1.sources.r1.interceptors.i1.hostHeader = agentHost

#3.定义sinks
a1.sinks.k1.type=hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs://bigdata111:9000/flumehost/%{agentHost}
a1.sinks.k1.hdfs.filePrefix = plus_%{agentHost}

#往生成的文件加后缀名.log
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

(2)启动命令:

bin/flume-ng agent -c conf/ 
-f jobconf/host.conf
-n a1 -Dflume.root.logger=INFO,console

1.3 UUID拦截器

(1)uuid.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /opt/plus
a1.sources.r1.interceptors = i1

#type的参数不能写成uuid,得写具体,否则找不到类
a1.sources.r1.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder

#如果UUID头已经存在,它应该保存
a1.sources.r1.interceptors.i1.preserveExisting = true
a1.sources.r1.interceptors.i1.prefix = UUID_

#如果sink类型改为HDFS,那么在HDFS的文本中没有headers的信息数据
a1.sinks.k1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

(2)启动命令:

bin/flume-ng agent -c conf/ 
-f jobconf/uuid.conf
-n a1 -Dflume.root.logger==INFO,console

1.4 查询替换拦截器

(1)search.conf

#1 agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#2 source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /opt/plus
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = search_replace

#遇到数字改成itstar,A123会替换为Aitstar
a1.sources.r1.interceptors.i1.searchPattern = [0-9]+
a1.sources.r1.interceptors.i1.replaceString = ***
a1.sources.r1.interceptors.i1.charset = UTF-8

#3 sink
a1.sinks.k1.type = logger

#4 Chanel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#5 bind
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

(2)启动命令:

bin/flume-ng agent -c conf/ 
-f jobconf/search.conf 
-n a1 -Dflume.root.logger=INFO,console

1.5 正则过滤拦截器

(1)filter.conf

#1 agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#2 source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /opt/plus
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = regex_filter
a1.sources.r1.interceptors.i1.regex = ^A.*

#如果excludeEvents设为false,表示过滤掉不是以A开头的events。如果excludeEvents设为true,则表示过滤掉以A开头的events。
a1.sources.r1.interceptors.i1.excludeEvents = true
a1.sinks.k1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

(2)启动命令:

bin/flume-ng agent -c conf/ 
-f jobconf/filter.conf 
-n a1 -Dflume.root.logger=INFO,console

1.6 正则抽取拦截器

(1)extractor.conf

#1 agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#2 source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /opt/plus
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = regex_extractor

# hostname is bigdata111 ip is 192.168.20.111
a1.sources.r1.interceptors.i1.regex = hostname is (.*?) ip is (.*)
a1.sources.r1.interceptors.i1.serializers = s1 s2

#hostname(自定义)= (.*?)->bigdata111 
a1.sources.r1.interceptors.i1.serializers.s1.name = hostname

#ip(自定义) = (.*)->192.168.20.111
a1.sources.r1.interceptors.i1.serializers.s2.name = ip
a1.sinks.k1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

(2)启动命令:

bin/flume-ng agent -c conf/ 
-f jobconf/extractor.conf 
-n a1 -Dflume.root.logger=INFO,console

:正则抽取拦截器的headers不会出现在文件名和文件内容中

二、Flume自定义拦截器

目标:字母小写变大写

2.1 添加Pom.xml依赖

<dependencies>
	<!-- flume核心依赖 -->
	<dependency>
	<groupId>org.apache.flume</groupId>
	<artifactId>flume-ng-core</artifactId>
	<version>1.8.0</version>
	</dependency>
</dependencies>
<build>
	<plugins>
	<!-- 打包插件 -->       
	<plugin>          
		<groupId>org.apache.maven.plugins</groupId>        
		<artifactId>maven-jar-plugin</artifactId>
		<version>2.4</version>
		<configuration>
			<archive>
				<manifest>
				<addClasspath>true</addClasspath>                
				<classpathPrefix>lib/</classpathPrefix>
				<mainClass></mainClass>
				</manifest>
			</archive>
		</configuration>        
	</plugin>       
	<!-- 编译插件 -->
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-compiler-plugin</artifactId>
		<configuration>              
			<source>1.8</source>            
			<target>1.8</target>            
			<encoding>utf-8</encoding>         
		</configuration>         
		</plugin>
	</plugins>   
</build>

2.2 自定义实现拦截器

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import java.util.ArrayList;
import java.util.List;

public class MyInterceptor implements
Interceptor {
   
	@Override
	public void initialize() {

    	}
   
	@Override
	public void close() {

    	}

    /**
     * 拦截source发送到通道channel中的消息
     *@param event 接收过滤的event
     *@return event    根据业务处理后的event
     */
  
	@Override
	public Event intercept(Event event) {
		// 获取事件对象中的字节数据
		byte[] arr = event.getBody();

		// 将获取的数据转换成大写
		event.setBody(new String(arr).toUpperCase().getBytes());

		// 返回到消息中
		return event;
    }

	// 接收被过滤事件集合
	@Override
	public List<Event> intercept(List<Event> events) {

	List<Event> list = new ArrayList<>();
	for (Event event : events) {
            list.add(intercept(event));
        }
		return list;
    }

public static class Builder implements Interceptor.Builder {
	// 获取配置文件的属性
	@Override
	public Interceptor build() {
		return new MyInterceptor();
        }
       
	@Override
	public void configure(Context context) {

        }

    }

使用Maven做成Jar包,在flume的目录下mkdir
jar,上传此jar到jar目录中

2.3 Flume配置文件

(1)ToUpCase.conf

#1.agent 
a1.sources = r1
a1.sinks =k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/plus
a1.sources.r1.interceptors = i1

#全类名$Builder
a1.sources.r1.interceptors.i1.type = ToUpCase.MyInterceptor$Builder

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /ToUpCase1
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.rollInterval = 3
a1.sinks.k1.hdfs.rollSize = 20
a1.sinks.k1.hdfs.rollCount = 5
a1.sinks.k1.hdfs.batchSize = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件类型,默认是 Sequencefile,可用 DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

(2)启动命令:

bin/flume-ng agent -c conf/ -n a1 
-f jar/ToUpCase.conf 
-C jar/Flume-1.0-SNAPSHOT.jar
-Dflume.root.logger=DEBUG,console

三、其它案例

请参考上一篇博文:链接: https://blog.csdn.net/weixin_43520450/article/details/105644663

  • 1
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值