fluem

最新推荐文章于 2024-07-16 13:43:40 发布

lcatake

最新推荐文章于 2024-07-16 13:43:40 发布

阅读量66

点赞数

分类专栏： flume 文章标签： hadoop hive 大数据

本文链接：https://blog.csdn.net/lcatake/article/details/128331994

版权

flume 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1.收集日志

2.数据处理

3.什么是flume

4.fliume的部署

5.event

6.flume的使用

1.采集数据到logger(控制台)

1.netca

2.exec

3.spooldir

4.taildir

2.输入文件到hdfs(sink hdfs)

1.文件内容

2.解决小文件

3.输入文件到hive

1.hive 普通表

2.hive 分区表

3.hive sink

4.hive 普通表+table 开启事务【Acid】

4.文件压缩和file

5.avro

6.sink Processors

1.负载

2.负载

7.数据分配channel selectors

8.数据清洗 interceptors

1.收集日志
A => batchSize
数据采集：把数据采集到服务器上
数据收集：把数据移动到指定位置

2,数据处理：
1.离线处理：批处理
数据已经放在那
2.实时处理：
产生一条数据处理一次

3.flume
1.官网： flume.apache.org
2.流程：
collecting 采集/收集 source
aggregating 聚合 channel
moving 移动 sink

   3.streaming data flows  flume采集数据 实时采集数据
   4.核心概念：user job：就是编写agent里面的配置
              agent：
                    source channel sink
                    source：采集数据
                          interceptors 拦截器 => 主要处理 采集的数据 数据转换/数据清洗的操作
	                  channel selectors   =》 采集的数据 发送到哪个 channle 

                    channel: 存储采集过来的数据
                    sink： 把采集来的数据发送出去
                           sink processers =》 采集的数据 发送到 哪个sink

4.部署
1.解压
2，环境变量
3.配置flume

      vim /home/hadoop/app/flume/lib/flume-env.sh
       export JAVA_HOME=/home/hadoop/app/java

5.event：一条数据

   headers：描述信息
   body： 存的是实实在在的数据
  报错： headers： 打标记
         body: 内容
        目的:正确的数据落到正确的目录下

6.flume的使用
1.采集数据到logger(控制台)
1.netcat：
从指定端口

            a1.sources = r1
            a1.sinks = k1
            a1.channels = c1
            #netcat方法
            a1.sources.r1.type = netcat
            #本地 
            a1.sources.r1.bind = localhost 
            #端口号
            a1.sources.r1.port = 44444   
            a1.channels.c1.type = memory
            #sink的类型为looger(控制台)
            a1.sinks.k1.type = logger     
            a1.sources.r1.channels = c1
            a1.sinks.k1.channel = c1
      启动
            flume-ng agent \
            --name a1 \
            --conf ${FLUME_HOME}/conf \
            --conf-file /home/hadoop/project/flume/nc-mem-logger.conf \
            -Dflume.root.logger=info,console
            开启端口
            telnet localhost 4444
            nc -k -l 


  2.exec 
          从 指定文件
            a1.sources = r1
            a1.sinks = k1
            a1.channels = c1
            #exec方法
            a1.sources.r1.type = exec
            #实时监控+文件地址
            a1.sources.r1.command = tail -F /home/hadoop/emp/flume/1.log 
            a1.channels.c1.type = memory
            #sink的类型为looger(控制台)
            a1.sinks.k1.type = logger
            a1.sources.r1.channels = c1
            a1.sinks.k1.channel = c1
      启动：
            flume-ng agent \
            --name a1 \
            --conf ${FLUME_HOME}/conf \
            --conf-file /home/hadoop/project/flume/exec-mem-logger.conf \
            -Dflume.root.logger=info,console


          exec问题：
              1. tail -F
              2.采集数据后flume挂掉后数据再次写入
  3.spooldir
          从 指定文件夹的内容 

            a1.sources = r1
            a1.sinks = k1
            a1.channels = c1
            #spooldir方法
            a1.sources.r1.type = spooldir
            #文件夹路径
            a1.sources.r1.spoolDir = /home/hadoop/emp/flume/test/
            a1.channels.c1.type = memory
            #sink的类型为looger(控制台)
            a1.sinks.k1.type = logger
            a1.sources.r1.channels = c1
            a1.sinks.k1.channel = c1
            启动：
            flume-ng agent \
            --name a1 \
            --conf ${FLUME_HOME}/conf \
            --conf-file /home/hadoop/project/flume/spooldir-mem-logger.conf \
            -Dflume.root.logger=info,console

  4.taildir
          从 指定文件夹和文件

            a1.sources = r1
            a1.sinks = k1
            a1.channels = c1
            #taildir方法
            a1.sources.r1.type = TAILDIR
            #f1 f2...进行采集
            a1.sources.r1.filegroups = f1 f2
            #f1地址
            a1.sources.r1.filegroups.f1=/home/hadoop/emp/flume/1.log
            #f2地址
            a1.sources.r1.filegroups.f2=/home/hadoop/emp/flume/test/.*.log
            a1.channels.c1.type = memory
            a1.sinks.k1.type = logger
            a1.sources.r1.channels = c1
            a1.sinks.k1.channel = c1
            
                  启动
            flume-ng agent \
            --name a1 \
            --conf ${FLUME_HOME}/conf \
            --conf-file /home/hadoop/project/flume/taildir-mem-logger.conf \
            -Dflume.root.logger=info,console

2.输入文件到hdfs(sink hdfs)
    1.文件内容     
               a1.sources = r1
               a1.sinks = k1
               a1.channels = c1
               a1.sources.r1.type = TAILDIR
               a1.sources.r1.filegroups = f1
               a1.sources.r1.filegroups.f1=/home/hadoop/emp/flume/1.log 
               a1.channels.c1.type = memory
               #sink的类型为hdfs
               a1.sinks.k1.type = hdfs
               #hdfs地址
               a1.sinks.k1.hdfs.path=hdfs://bigdata13:9000/flume/log/
               #输出文件格式为数据流 (不设置会是乱码)
               a1.sinks.k1.hdfs.fileType=DataStream
               #输出文件格式 
               a1.sinks.k1.hdfs.writeFormat=Text
               #文件前缀
               a1.sinks.k1.hdfs.filePrefix=events
               #文件后缀
               a1.sinks.k1.hdfs.fileSuffix=.log
               #使用本机时间 可能出现本机时间不对
               a1.sinks.k1.hdfs.useLocalTimeStamp=true
               #文件滚动
               #每60s采集到文件一次
               a1.sinks.k1.hdfs.rollInterval=60
               #没128G采集到文件一次
               a1.sinks.k1.hdfs.rollSize=134217728
               #每1000行采集到文件一次
               a1.sinks.k1.hdfs.rollCount=1000
               
               a1.sources.r1.channels = c1
               a1.sinks.k1.channel = c1
     2.解决小文件
            1.hdfs.batchSize 不用
            2.两大类(可能有用)
                  hdfs.round =》是否开启文件滚动
              1.按照条数文件发生滚动
                           hdfs.rollSize
              2.按照时间 文件发生滚动
                  hdfs.roundUnit => 时间滚动单元  second,minute or hour
                  hdfs.roundValue => 时间具体值
              3.有用
                  hdfs.rollInterval =》 按照时间滚动(秒)
                  hdfs.rollSize => 按照文件大小  (134,217,728 =》 128G)
                  hdfs.rollCount => 按照hdfs文件数据条数 (条)
              4.文件内容
                 a1.sources = r1
                 a1.sinks = k1
                 a1.channels = c1
                 a1.sources.r1.type = TAILDIR
                 a1.sources.r1.filegroups = f1
                 a1.sources.r1.filegroups.f1=/home/hadoop/emp/flume/1.log
                 a1.channels.c1.type = memory
                 a1.sinks.k1.type = hdfs
                 a1.sinks.k1.hdfs.path=hdfs://bigdata13:9000/flume/log/
                 a1.sinks.k1.hdfs.fileType=DataStream
                 #了解
                 a1.sinks.k1.hdfs.writeFormat=Text
                 a1.sinks.k1.hdfs.round=true
                 a1.sinks.k1.hdfs.roundUnit=minute
                 a1.sinks.k1.hdfs.roundValue=1
                 #文件滚动
                 a1.sinks.k1.hdfs.rollInterval=60
                 a1.sinks.k1.hdfs.rollSize=134217728
                 a1.sinks.k1.hdfs.rollCount=10
                 a1.sources.r1.channels = c1
                 a1.sinks.k1.channel = c1
        启动flume：
                 flume-ng agent \
                 --name a1 \
                 --conf ${FLUME_HOME}/conf \
                 --conf-file /home/hadoop/project/flume/taildir-mem-hdfs-round.conf \
                 -Dflume.root.logger=info,console
         

3.输入文件到hive
        1.hive 普通表
              a1.sources = r1
              a1.sinks = k1
              a1.channels = c1
              a1.sources.r1.type = TAILDIR
              a1.sources.r1.filegroups = f1
              #当地文件
              a1.sources.r1.filegroups.f1=/home/hadoop/emp/1.txt
              a1.channels.c1.type = memory
              #sink类型为 hdfs
              a1.sinks.k1.type = hdfs
              #hive路径
              a1.sinks.k1.hdfs.path = hdfs://bigdata13:9000/user/hive/warehouse/bigdata_hive.db/emp
             
              a1.sinks.k1.hdfs.fileType=DataStream
              
              a1.sinks.k1.hdfs.writeFormat=Text
              a1.sinks.k1.hdfs.filePrefix=events
              a1.sinks.k1.hdfs.fileSuffix=.log
              a1.sinks.k1.hdfs.rollInterval=60
              a1.sinks.k1.hdfs.rollSize=134217728
              a1.sinks.k1.hdfs.rollCount=100
              a1.sources.r1.channels = c1
              a1.sinks.k1.channel = c1
              启动flume：
              flume-ng agent \
              --name a1 \
              --conf ${FLUME_HOME}/conf \
              --conf-file /home/hadoop/project/flume/hive/taildir-mem--hdfs-emp.conf \
              -Dflume.root.logger=info,console
         2.hive 分区表
              a1.sources = r1
              a1.sinks = k1
              a1.channels = c1
              a1.sources.r1.type = TAILDIR
              a1.sources.r1.filegroups = f1
              a1.sources.r1.filegroups.f1=/home/hadoop/tmp/000000_0
              a1.channels.c1.type = memory
              a1.sinks.k1.type = hdfs
              # hive路径+分区字段
              a1.sinks.k1.hdfs.path = hdfs://bigdata13:9000/user/hive/warehouse/bigdata_hive.db/emp_p/deptno=10
              a1.sinks.k1.hdfs.fileType=DataStream
              a1.sinks.k1.hdfs.writeFormat=Text
              a1.sinks.k1.hdfs.filePrefix=events
              a1.sinks.k1.hdfs.fileSuffix=.log
              a1.sinks.k1.hdfs.rollInterval=60
              a1.sinks.k1.hdfs.rollSize=134217728
              a1.sinks.k1.hdfs.rollCount=100
              a1.sources.r1.channels = c1
              a1.sinks.k1.channel = c1
              启动flume：
              flume-ng agent \
              --name a1 \
              --conf ${FLUME_HOME}/conf \
              --conf-file /home/hadoop/project/flume/hive/taildir-mem-hdfs-emp_p.conf \
              -Dflume.root.logger=info,console
        3.hive  sink 
 	         1.emp.txt
 	         2.hive emp  普通表
                  souce：taildir
                  channel:mem
                  sink:hivesink
           3.文件内容
              a1.sources = r1
              a1.sinks = k1
              a1.channels = c1
              
              a1.sources.r1.type = TAILDIR
              a1.sources.r1.filegroups = f1
              a1.sources.r1.filegroups.f1=/home/hadoop/tmp/emp.txt
              
              a1.channels.c1.type = memory
              
              a1.sinks.k1.type = hive
              a1.sinks.k1.hive.metastore=   => 需要hive 启动metastore 服务
              a1.sinks.k1.hive.database=bigdata_hive
              a1.sinks.k1.hive.table=emp
              a1.sinks.k1.serializer=DELIMITED  ==>指定表中字段分割符
              a1.sinks.k1.serializer.delimiter=','
              a1.sinks.k1.serializer.fieldnames=empno,ename,job,mgr,hiredate,sal,comm,deptno
              
              a1.sources.r1.channels = c1
              a1.sinks.k1.channel = c1
              
              ---------------
              a1.sources = r1
              a1.sinks = k1
              a1.channels = c1
              
              a1.sources.r1.type = TAILDIR
              a1.sources.r1.filegroups = f1
              a1.sources.r1.filegroups.f1=/home/hadoop/project/flume/hive/bucket_00000
              
              a1.channels.c1.type = memory
              a1.channels.c1.transactionCapacity=15000
              
              a1.sinks.k1.type = hive
              a1.sinks.k1.hive.metastore= thrift://127.0.0.1:9083
              a1.sinks.k1.hive.database=bigdata_hive
              a1.sinks.k1.hive.table=emp
              a1.sinks.k1.serializer=DELIMITED
              a1.sinks.k1.serializer.delimiter=','
              a1.sinks.k1.serializer.fieldnames=empno,ename,job,mgr,hiredate,sal,comm,deptno
              a1.sinks.k1.batchSize=100
              
              a1.sources.r1.channels = c1
              a1.sinks.k1.channel = c1
               报错：1.设置channels为15000  或者sinks为100(默认15000)
                     让channels >= sinks
                     2.flume lib目录下添加hive-hcatalog-streaming-3.1.3.jar
               
               启动：
                 flume-ng agent \
                 --name a1 \
                 --conf ${FLUME_HOME}/conf \
                 --conf-file /home/hadoop/project/flume/hive/taildir-mem-hive-emp.conf \
                 -Dflume.root.logger=info,console
                 
        4.hive 普通表+table 开启事务【Acid】
               1.差别：
                    1.source  emp.txt  =>行式存储
                    2.table  hive acid orc 列式存储
                      加入数据：insert into table table_name select * from emp.txt 
               
               2.sink:
                   hdfs
                   hive => hdfs
                   logger (控制台)
                   avro +》序列化
                
                3.通常不需要 双层flume
                
                4. log => flume => hdfs
                                 => 实时计算
                                 =》kafka =》实时计算

4.压缩和file
source: exec taildir
channle :mem file
sink: hdfs => bzip2

              agent: 
              agent1.sources = r1
              agent1.sinks = k1
              agent1.channels = c1
              
              agent1.sources.r1.type = TAILDIR
              agent1.sources.r1.filegroups = f1
              agent1.sources.r1.filegroups.f1=/home/hadoop/tmpcodec01.log
              
              #channles的类型为file
              agent1.channels.c1.type = file
              #存放检查点的目录
              agent1.channels.c1.checkpointDir = /home/hadoopproject/flume//codec
              
              agent1.channels.c1.dataDirs = /home/hadoop/projectflume/data/codec
              
              agent1.sinks.k1.type = hdfs
              agent1.sinks.k1.hdfs.path = hdfs://bigdata13:9000flume/bzip2/
              #sink为压缩格式
              agent1.sinks.k1.hdfs.fileType=CompressedStream
              agent1.sinks.k1.hdfs.writeFormat=Text
              
              #压缩格式为bzip2
              agent1.sinks.k1.hdfs.codeC=bzip2
              #文件前后缀
              agent1.sinks.k1.hdfs.filePrefix=events
              agent1.sinks.k1.hdfs.fileSuffix=.bz2
              
              
              #文件滚动 
              agent1.sinks.k1.hdfs.rollInterval=60
              agent1.sinks.k1.hdfs.rollSize=134217728
              agent1.sinks.k1.hdfs.rollCount=100
              
              agent1.sources.r1.channels = c1
              agent1.sinks.k1.channel = c1
              
              
              启动： 
              flume-ng agent \
              --name agent1 \
              --conf ${FLUME_HOME}/conf \
              --conf-file /home/hadoop/projectflume/                taildir-file-hdfs-bzip2.conf \
              -Dflume.root.logger=info,console

5.avro：第一个agent的 sink 作为第二个 agent的source
要求：读取1111端口数据数据发送到2222端口最终2222端口把数据写入hdfs

    agent：
           nc-mem-avro   (开启端口)
           avro-mem-hdfs (将1111的数据传入2222)
           avro-mem-logger (将2222的数据打印到控制台)
    
    agent1：telnet localhost 1111
     
    agent2： nc-mem-avro.conf
             a1.sources = r1
             a1.sinks = k1
             a1.channels = c1
             
             a1.sources.r1.type = netcat
             a1.sources.r1.bind = localhost
             a1.sources.r1.port = 1111
             
             a1.channels.c1.type = memory
             #sink的类型为avro
             a1.sinks.k1.type = avro
             a1.sinks.k1.hostname=bigdata13
             a1.sinks.k1.port=2222
             
             a1.sources.r1.channels = c1
             a1.sinks.k1.channel = c1
            启动：flume-ng agent \
                  --name a1 \
                  --conf ${FLUME_HOME}/conf \
                  --conf-file /home/hadoop/project/flume/avro/nc-mem-avro.conf \
                  -Dflume.root.logger=info,console
    agent3： avro-mem-logger.conf
            a1.sources = r1
            a1.sinks = k1
            a1.channels = c1
            
            a1.sources.r1.type = avro
            a1.sources.r1.bind = bigdata13
            a1.sources.r1.port = 2222
            a1.channels.c1.type = memory
            a1.sinks.k1.type = logger
            a1.sources.r1.channels = c1
            a1.sinks.k1.channel = c1
    启动：
            flume-ng agent \
            --name a1 \
            --conf ${FLUME_HOME}/conf \
            --conf-file /home/hadoop/project/flume/avro/avro-mem-logger.conf \
            -Dflume.root.logger=info,console
    启动顺序：agent3 ->agent2 ->agent1

6.sink Processors 负载均衡
1.负载:
agent1：
agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1

                        agent1.sources.r1.type = netcat
                        agent1.sources.r1.bind = bigdata13
                        agent1.sources.r1.port = 1111
                        
                        agent1.channels.c1.type = memory
                        
                        #定义sink 2222
                        agent1.sinks.k1.type = avro
                        agent1.sinks.k1.hostname = bigdata13
                        agent1.sinks.k1.port = 2222
                        
                        #定义sink 3333
                        agent1.sinks.k2.type = avro
                        agent1.sinks.k2.    hostname = bigdata13
                        agent1.sinks.k2.port = 3333
                        
                        #定义sink processers
                        agent1.sinkgroups = g1
                        agent1.sinkgroups.g1.sinks = k1 k2
                        #优先级高的故障了，转到优先级低的  failover 
                        agent1.sinkgroups.g1.processor.type = failover
                        #优先级 绝对值越大 优先级越高
                        agent1.sinkgroups.g1.processor.priority.k1 = 5
                        agent1.sinkgroups.g1.processor.priority.k2 = 10
                        #时间毫秒值（2000=2s）
                        agent1.sinkgroups.g1.processor.maxpenalty = 2000
                        
                        agent1.sources.r1.channels = c1
                        agent1.sinks.k1.channel = c1
                        agent1.sinks.k2.channel = c1
                        

                        flume-ng agent \
                        --name agent1 \
                        --conf ${FLUME_HOME}/conf \
                        --conf-file /home/hadoop/project/flume/sink/agent1_failover.conf \
                        -Dflume.root.logger=info,console



  2.均衡(load_blance):
        	1.将数据分开 提供并行度的功能 减轻sink 压力  
            2.如果 第二个或者第三个 agent挂掉  数据都会发送到 没挂的sink 对应的agent上面 
           
           案例：将读取1111端口数据 数据发送到 2222端口和3333端口  最终数据输出到 控制台
            3个agent：
                agent1: 
                        source:netcat
                        channel：mem
                        sink：avro 俩sink 2222 3333
                agent2：2222
                        source:avro 2222
                        channel:mem
                        sink:logger
                agent3: 3333
                         source:avro 3333
                        channel:mem
                        sink logger
                
                代码：
                agent1：
                        agent1.sources = r1
                        #sink俩端口
                        agent1.sinks = k1 k2
                        agent1.channels = c1
                        
                        agent1.sources.r1.type = netcat
                        agent1.sources.r1.bind = bigdata13
                        agent1.sources.r1.port = 1111
                        
                        agent1.channels.c1.type = memory
                        
                        #定义sink 2222
                        agent1.sinks.k1.type = avro
                        agent1.sinks.k1.hostname = bigdata13
                        agent1.sinks.k1.port = 2222
                        
                        #定义sink 3333
                        agent1.sinks.k2.type = avro
                        agent1.sinks.k2.hostname = bigdata13
                        agent1.sinks.k2.port = 3333
                        
                        #定义sink processers
                        agent1.sinkgroups = g1
                        agent1.sinkgroups.g1.sinks = k1 k2
                        #load_balence 均衡
                        agent1.sinkgroups.g1.processor.type = load_balance
                        #
                        agent1.sinkgroups.g1.processor.backoff = true
                        #round_robin 轮转  random 随机
                        agent1.sinkgroups.g1.processor.selector =     round_robin
                        
                        agent1.sources.r1.channels = c1
                        agent1.sinks.k1.channel = c1
                        agent1.sinks.k2.channel = c1
                
                agent2：2222端口
                        agent2.sources = r1
                        agent2.sinks = k1
                        agent2.channels = c1
                        
                        agent2.sources.r1.type = avro
                        agent2.sources.r1.bind = bigdata13
                        agent2.sources.r1.port = 2222
                        
                        agent2.channels.c1.type = memory
                        agent2.sinks.k1.type = logger
                        
                        agent2.sources.r1.channels = c1
                        agent2.sinks.k1.channel = c1

                agent3：3333端口
                        agent3.sources = r1
                        agent3.sinks = k1
                        agent3.channels = c1
                        
                        agent3.sources.r1.type = avro
                        agent3.sources.r1.bind = bigdata13
                        agent3.sources.r1.port = 3333
                        
                        agent3.channels.c1.type = memory
                        agent3.sinks.k1.type = logger
                        
                        agent3.sources.r1.channels = c1
                        agent3.sinks.k1.channel = c1

                启动： 
                       启动agent3：
                       flume-ng agent \
                       --name agent3 \
                       --conf ${FLUME_HOME}/conf \
                       --conf-file /home/hadoop/project/flume/sink/agent3.conf \
                         -Dflume.root.logger=info,console
                       启动agent2：

                       flume-ng agent \
                       --name agent2 \
                       --conf ${FLUME_HOME}/conf \
                       --conf-file /home/hadoop/project/flume/sink/agent2.conf \
                       -Dflume.root.logger=info,console
                       
                       agent1： 
                       flume-ng agent \
                       --name agent1 \
                       --conf ${FLUME_HOME}/conf \
                       --conf-file /home/hadoop/project/flume/sink/agent1.conf \
                       -Dflume.root.logger=info,console

                        启动端口：telnet bigdata13 1111

             
  3.Default Sink

7.数据分配 channel selectors
需求：
定一个agent 端口1111采集数据一个发送到 hdfs
另外一个发送到 logger
1.三个agent完成上面的事情：
agent1： 1111接收数据发送 2222 和3333端口
agent2: 接收2222 数据发送到 logger
agent3: 接收3333 数据发送到 logger

        agent1:
        agent1.sources = r1
        agent1.sinks = k1 k2
        agent1.channels = c1 c2
        
        agent1.sources.r1.type = netcat
        agent1.sources.r1.bind = bigdata13
        agent1.sources.r1.port = 1111
        
        #0 配置source channle
        agent1.sources.r1.selector.type = replicating
        agent1.sources.r1.channels = c1 c2
        
        #1.配置两个channel
        agent1.channels.c1.type = memory
        agent1.channels.c2.type = memory
        
        #定义sink hdfs
        #定义sink 2222
        agent1.sinks.k1.type = avro
        agent1.sinks.k1.hostname = bigdata13
        agent1.sinks.k1.port = 2222
        
        #定义sink 3333
        agent1.sinks.k2.type = avro
        agent1.sinks.k2.hostname = bigdata13
        agent1.sinks.k2.port = 3333
        
        #定义 连接
        agent1.sources.r1.channels = c1 c2
        agent1.sinks.k1.channel = c1
        agent1.sinks.k2.channel = c2
        
        
        启动agent：
        
        flume-ng agent \
        --name agent3 \
        --conf ${FLUME_HOME}/conf \
        --conf-file /home/hadoop/project/flume/one2many/agent3.conf \
        -Dflume.root.logger=info,console
        
        flume-ng agent \
        --name agent2 \
        --conf ${FLUME_HOME}/conf \
        --conf-file /home/hadoop/project/flume/one2many/agent2.conf \
        -Dflume.root.logger=info,console
        
        flume-ng agent \
        --name agent1 \
        --conf ${FLUME_HOME}/conf \
        --conf-file /home/hadoop/project/flume/one2many/agent1.conf \
        -Dflume.root.logger=info,console
        
        telnet bigdata13 1111

8.数据清洗 interceptors
1.多种日志采集到一个agent里面之后通过这个agent进行指定数据分发
agent1:
agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1

        agent1.sources.r1.type = netcat
        agent1.sources.r1.bind = bigdata13
        #agent1输入端口
        agent1.sources.r1.port = 1111
        
        #添加一个拦截器 =》 数据清洗 + event打标签 li
        agent1.sources.r1.interceptors = i1
        #静态
        agent1.sources.r1.interceptors.i1.type = static
        #key
        agent1.sources.r1.interceptors.i1.key = dl2262
        #value=boy
        agent1.sources.r1.interceptors.i1.value = boy
        #0 配置source channle
        agent1.sources.r1.channels = c1
        #1.配置两个channel
        agent1.channels.c1.type = memory
        #定义sink 2222
        agent1.sinks.k1.type = avro
        agent1.sinks.k1.hostname = bigdata13
        agent1.sinks.k1.port = 2222
        #定义 连接
        agent1.sources.r1.channels = c1
        agent1.sinks.k1.channel = c1
        
        
        agent2:
        agent2.sources = r1
        agent2.sinks = k1
        agent2.channels = c1
        
        agent2.sources.r1.type = netcat
        agent2.sources.r1.bind = bigdata13
        agent2.sources.r1.port = 1112
        
        #添加一个拦截器 =》 数据清洗 + event打标签 li
        agent2.sources.r1.interceptors = i1
        agent2.sources.r1.interceptors.i1.type = static
        agent2.sources.r1.interceptors.i1.key = dl2262
        agent2.sources.r1.interceptors.i1.value = girl
        #0 配置source channle
        agent2.sources.r1.channels = c1
        #1.配置两个channel
        agent2.channels.c1.type = memory
        #定义sink 2222
        agent2.sinks.k1.type = avro
        agent2.sinks.k1.hostname = bigdata13
        agent2.sinks.k1.port = 2222
        #定义 连接
        agent2.sources.r1.channels = c1
        agent2.sinks.k1.channel = c1
        
        agent3:
        agent3.sources = r1
        agent3.sinks = k1
        agent3.channels = c1
        
        agent3.sources.r1.type = netcat
        agent3.sources.r1.bind = bigdata13
        agent3.sources.r1.port = 1113
        
        #添加一个拦截器 =》 数据清洗 + event打标签
        agent3.sources.r1.interceptors = i1
        agent3.sources.r1.interceptors.i1.type = static
        agent3.sources.r1.interceptors.i1.key = dl2262
        agent3.sources.r1.interceptors.i1.value = tea
        #0 配置source channle
        agent3.sources.r1.channels = c1
        #1.配置两个channel
        agent3.channels.c1.type = memory
        #定义sink 2222
        agent3.sinks.k1.type = avro
        agent3.sinks.k1.hostname = bigdata13
        agent3.sinks.k1.port = 2222
        #定义 连接
        agent3.sources.r1.channels = c1
        agent3.sinks.k1.channel = c1
        
        agent4:
        
        agent4.sources = r1
        agent4.sinks = k1 k2 k3
        agent4.channels = c1 c2 c3
        
        agent4.sources.r1.type = avro
        agent4.sources.r1.bind = bigdata13
        agent4.sources.r1.port = 2222
        
        
        #0 配置source channle 
        #multiplexing 根据value过滤
        agent4.sources.r1.selector.type = multiplexing
        agent4.sources.r1.selector.header = dl2262
        agent4.sources.r1.selector.mapping.boy = c1
        agent4.sources.r1.selector.mapping.girl = c2
        agent4.sources.r1.selector.default = c3
        agent4.sources.r1.channels = c1 c2 c3
        #1.配置两个channel
        agent4.channels.c1.type = memory
        agent4.channels.c2.type = memory
        agent4.channels.c3.type = memory
        #定义sink logger
        agent4.sinks.k1.type =logger
        agent4.sinks.k2.type =logger
        agent4.sinks.k3.type =logger
        #定义 连接
        agent4.sources.r1.channels = c1 c2 c3
        agent4.sinks.k1.channel = c1
        agent4.sinks.k2.channel = c2
        agent4.sinks.k3.channel = c3
        
        启动：
        flume-ng agent \
        --name agent4 \
        --conf ${FLUME_HOME}/conf \
        --conf-file /home/hadoop/project/flume/many2one/agent4.conf \
        -Dflume.root.logger=info,console
        
            flume-ng agent \
            --name agent3 \
            --conf ${FLUME_HOME}/conf \
            --conf-file /home/hadoop/project/flume/many2one/agent3.conf \
            -Dflume.root.logger=info,console
        
        flume-ng agent \
        --name agent2 \
        --conf ${FLUME_HOME}/conf \
        --conf-file /home/hadoop/project/flume/many2one/agent2.conf \
        -Dflume.root.logger=info,console
        
        flume-ng agent \
        --name agent1 \
        --conf ${FLUME_HOME}/conf \
        --conf-file /home/hadoop/project/flume/many2one/agent1.conf \
        -Dflume.root.logger=info,console
        telnet bigdata13 1111
        telnet bigdata13 1112
        telnet bigdata13 1113

9.channle：
默认容量 >= 事务容量
1，默认容量
capacity 100
2.事务容量
transactionCapacity 100
souce =》channle
channle => sink
10.监控:
1.手段
1.flume 提供的ganglia 框架指标【需要安装ganglia +】
2.通过 agent 启动配置一些参数 http方式获取【推荐】
json数据 =》http接口数据 =》
1.前端人员可视化界面展示
2。采集http接口数据=》musql可视化
2.