flume

flume主要任务时采集数据和搬运数据的;需要写agent配置文件(agent包括source, channel, sink)
1.flume source监控端口,固定目录,exec执行命令等
2.flume channel可以写内存,写磁盘,写内存并溢写磁盘,
2.flume sink将数据导入到hdfs上,logger打印日志,kafka上,下一个flume上
3.flume拦截器

flume监听端口(source是数据源是端口,channel是写到内存里,sink是输出到打印台上)
步骤:
(1)写执行文件:vi flume/conf/a1.conf,写入如下内容
(2)输入命令:在flume目录下:./bin/flume-ng agent --name a1 --conf ./conf/ --conf-file ./conf/a1.conf -Dflume.root.logger=INFO,console
(3)监听端口:nc localhost 44444或者telnet localhost 44444(yum instal nc或者yum install telnet)

#起名字,其中a1是用在flume命令指定名字的地方,--name a1; so1和c1和s1是自己给agent的三个组件起的名字
a1.sources=so1
a1.channels=c1
a1.sinks=s1
#设置source参数,类型是netcat(用于监听tcp或者udp的端口(亲体是该端口是打开的,比如44444端口,需要执行flume命令后才打开,才能用telnet localhost 44444监听或者用nc localhost 44444监听,否则无法监听到该端口)),
a1.sources.so1.type=netcat
a1.sources.so1.bind=localhost
a1.sources.so1.port=44444
#设置channel参数,memory是内存,file是磁盘,capacity是channel存放最大容量,transaction是sink接收channel数据时的最大容量
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
#sink参数设置,类型为logger,即日志打印
a1.sinks.s1.type=logger
#source---channel(这个是复数,因为source的channel是一对多)和channel---sink(这个是单数,channel和sink是一对一)之间的传输设置
a1.sources.so1.channels=c1
a1.sinks.s1.channel=c1

将文件用flume写入hdfs上:(source是数据源是文件路径,channel是写到磁盘上,sink是输出到hdfs里)
开启hdfs–编写conf文件—新建文件夹(源文件文件夹,checkpoint文件夹,data文件夹,hdfs上的目标文件夹)—将源文件导入文件夹内–执行

<!--定义flume的三个组件:source,channel,sink-->
locale.sources=localeSource
locale.channels=localeChannel
locale.sinks=localeSink

<!--设置source的属性:类型,地址,姓名匹配,取用方式(行),行的长度最大值-->

locale.sources.localeSource.type=spooldir
locale.sources.localeSource.spoolDir=/opt/flumlogfile/sources/locale
locale.sources.localeSource.serializer=LINE
locale.sources.localeSource.serializer.maxLineLength=32000
locale.sources.localeSource.includePattern=locale_[0-9]{4}-[0-9]{2}-[0-9]{2}.txt

<!--设置channel属性,类型,checkpoint地址,data地址-->

locale.channels.localeChannel.type=file
locale.channels.localeChannel.checkpointDir=/opt/flumlogfile/checkpoint/locale
locale.channels.localeChannel.dataDirs=/opt/flumlogfile/data/locale

<!--设置sink属性,sink类型,文件类型,文件前缀,文件后缀,文件路径,使用时间,批次大小events数量,批次数量,回滚大小,回滚时间-->

locale.sinks.localeSink.type=hdfs
locale.sinks.localeSink.hdfs.fileType=DataStream
locale.sinks.localeSink.hdfs.path=hdfs://192.168.236.8:9000/kb11file/locale
locale.sinks.localeSink.hdfs.filePrefix=locale
locale.sinks.localeSink.hdfs.fileSuffix=.txt
locale.sinks.localeSink.hdfs.useLocalTimeStamp=true
locale.sinks.localeSink.hdfs.batchSize=640
locale.sinks.localeSink.hdfs.rollCount=0
locale.sinks.localeSink.hdfs.rollSize=6400000
locale.sinks.localeSink.hdfs.rollInterval=30

<!--设置source和channel   sinks和channel的对应关系-->

locale.sources.localeSource.channels=localeChannel
locale.sinks.localeSink.channel=localeChannel

执行

./bin/flume-ng agent --name locale --conf ./conf/ --conf-file ./conf/kb11job/locale-flume-logger.conf -Dflume.root.logger=INTO,console

用exec命令处理监听的目录

a1.sources=so1
a1.channels=c1
a1.sinks=s1
#设置source参数,类型是exec,写入command命令,可以用echo "xxxx">>/opt/events/exec往exec里动态写入数据
a1.sources.so1.type=exec
a1.sources.so1.command=tail -f /opt/events/exec

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.s1.type=logger

a1.sources.so1.channels=c1
a1.sinks.s1.channel=c1

拦截器(处于source和channel之间,可以用idea的java+maven编写,也可以在agent的配置文件里写)+从44444端口输入数据,根据开头hello, hi, other分类存储三个channel中,并分别存入hdfs, kafka ,logger打印中去;

错误的地方:
agent配置文件:jar包的包名不规范,而且没有加$Builder; sink设置为logger时没有加type; hdfs的端口是9000而不是9092;
启动:启动时未启动hdfs

a1.sources=so1
a1.channels=hellochannel hichannel otherchannel
a1.sinks=hellosink hisink othersink

a1.sources.so1.type=netcat
a1.sources.so1.bind=localhost
a1.sources.so1.port=44444
a1.sources.so1.interceptors=interceptor1
a1.sources.so1.interceptors.interceptor1.type=kb12.InterceptorDemo$Builder
a1.sources.so1.selector.type=multiplexing
a1.sources.so1.selector.mapping.hello=hellochannel
a1.sources.so1.selector.mapping.hi=hichannel
a1.sources.so1.selector.mapping.other=otherchannel
a1.sources.so1.selector.header=type

a1.channels.hellochannel.type=memory
a1.channels.hellochannel.capacity=1000
a1.channels.hellochannel.transactionCapacity=100

a1.channels.hichannel.type=memory
a1.channels.hichannel.capacity=1000
a1.channels.hichannel.transactionCapacity=100

a1.channels.otherchannel.type=memory
a1.channels.otherchannel.capacity=1000
a1.channels.otherchannel.transactionCapacity=100

a1.sinks.hellosink.type=hdfs
a1.sinks.hellosink.hdfs.fileType=DataStream
a1.sinks.hellosink.hdfs.filePrefix=hello
a1.sinks.hellosink.hdfs.fileSuffix=.csv
a1.sinks.hellosink.hdfs.path=hdfs://192.168.236.8:9092/kb11file/hello
a1.sinks.hellosink.hdfs.useLocalTimeStamp=true
a1.sinks.hellosink.hdfs.batchSize=640
a1.sinks.hellosink.hdfs.rollCount=0
a1.sinks.hellosink.hdfs.rollSize=6400000
a1.sinks.hellosink.hdfs.rollInterval=3

a1.sinks.hisink.type=org.apache.flume.sink.kafka.KafkaSink
a1.sinks.hisink.batchSize=640
a1.sinks.hisink.brokerList=192.168.236.8:9092
a1.sinks.hisink.topic=hi

a1.sinks.othersink.type=logger

a1.sources.so1.channels=hellochannel hichannel otherchannel
a1.sinks.hellosink.channel=hellochannel
a1.sinks.hisink.channel=hichannel
a1.sinks.othersink.channel=otherchannel

interceptor的java代码,需要打包扔进flume的lib下

//需要导入的依赖
		<dependency>
      <groupId>org.apache.flume</groupId>
      <artifactId>flume-ng-core</artifactId>
      <version>1.6.0</version>
      </dependency>
//拦截器
package kb11.flume;


import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * @Author Lu Changshuai
 * @Data 2021/5/25
 * @Description
 * 对source接收到的event进行分辨,如果event:header,body,body内容hello开头,则给当前的event
 * header打入hello标签;如果body内容以hi开头,则给当前的event header打入hi标签
 */

public class InterceptorDemo implements Interceptor {

ArrayList<Event> addHeaderEvents=null;
    @Override
    public void initialize() {
        addHeaderEvents = new ArrayList<>();
    }

    @Override
    public Event intercept(Event event) {
        Map<String, String> headers = event.getHeaders();
        byte[] body = event.getBody();
        String bodyStr = new String(body);
        if(bodyStr.startsWith("hello")){
            headers.put("type","hello");
        }else if(bodyStr.startsWith("hi")){
            headers.put("type","hi");
        }else{
            headers.put("type","other");
        }
        return event;
    }
//该重载的方法调用itercept方法处理event集合里的每个event对象,之后把event集合返回出来
    @Override
    public List<Event> intercept(List<Event> list) {
        addHeaderEvents.clear();
        for(Event event:list){
         Event opEvent=intercept(event);
         addHeaderEvents.add(opEvent);
        }
        return addHeaderEvents;
    }

    @Override
    public void close() {
addHeaderEvents.clear();
addHeaderEvents=null;
    }
    //用来返回拦截器对象
    public static class Builder implements Interceptor.Builder{
        @Override
        public Interceptor build() {
            return new InterceptorDemo();
        }

        @Override
        public void configure(Context context) {

        }
    }
}
      

执行:
./bin/flume-ng agent --name a1 --conf ./conf/ --conf-file ./conf/kb11job/three-flume-logger.conf -Dflume.root.logger=INFO,console
执行:
telnet localhost 44444

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值