flume的使用
1.收集日志
2.数据处理
3.什么是flume
4.fliume的部署
5.event
6.flume的使用
1.采集数据到logger(控制台)
1.netca
2.exec
3.spooldir
4.taildir
2.输入文件到hdfs(sink hdfs)
1.文件内容
2.解决小文件
3.输入文件到hive
1.hive 普通表
2.hive 分区表
3.hive sink
4.hive 普通表+table 开启事务【Acid】
4.文件压缩和file
5.avro
6.sink Processors
1.负载
2.负载
7.数据分配channel selectors
8.数据清洗 interceptors
1.收集日志
A => batchSize
数据采集:把数据采集到服务器上
数据收集:把数据移动到指定位置
2,数据处理:
1.离线处理:批处理
数据已经放在那
2.实时处理:
产生一条数据 处理一次
3.flume
1.官网: flume.apache.org
2.流程 :
collecting 采集/收集 source
aggregating 聚合 channel
moving 移动 sink
3.streaming data flows flume采集数据 实时采集数据
4.核心概念:user job:就是编写agent里面的配置
agent:
source channel sink
source:采集数据
interceptors 拦截器 => 主要处理 采集的数据 数据转换/数据清洗的操作
channel selectors =》 采集的数据 发送到哪个 channle
channel: 存储采集过来的数据
sink: 把采集来的数据发送出去
sink processers =》 采集的数据 发送到 哪个sink
4.部署
1.解压
2,环境变量
3.配置flume
vim /home/hadoop/app/flume/lib/flume-env.sh
export JAVA_HOME=/home/hadoop/app/java
5.event:一条数据
headers:描述信息
body: 存的是实实在在的数据
报错: headers: 打标记
body: 内容
目的:正确的数据落到正确的目录下
6.flume的使用
1.采集数据到logger(控制台)
1.netcat:
从 指定端口
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#netcat方法
a1.sources.r1.type = netcat
#本地
a1.sources.r1.bind = localhost
#端口号
a1.sources.r1.port = 44444
a1.channels.c1.type = memory
#sink的类型为looger(控制台)
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/nc-mem-logger.conf \
-Dflume.root.logger=info,console
开启端口
telnet localhost 4444
nc -k -l
2.exec
从 指定文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#exec方法
a1.sources.r1.type = exec
#实时监控+文件地址
a1.sources.r1.command = tail -F /home/hadoop/emp/flume/1.log
a1.channels.c1.type = memory
#sink的类型为looger(控制台)
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动:
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/exec-mem-logger.conf \
-Dflume.root.logger=info,console
exec问题:
1. tail -F
2.采集数据后flume挂掉后数据再次写入
3.spooldir
从 指定文件夹的内容
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#spooldir方法
a1.sources.r1.type = spooldir
#文件夹路径
a1.sources.r1.spoolDir = /home/hadoop/emp/flume/test/
a1.channels.c1.type = memory
#sink的类型为looger(控制台)
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动:
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/spooldir-mem-logger.conf \
-Dflume.root.logger=info,console
4.taildir
从 指定文件夹和文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#taildir方法
a1.sources.r1.type = TAILDIR
#f1 f2...进行采集
a1.sources.r1.filegroups = f1 f2
#f1地址
a1.sources.r1.filegroups.f1=/home/hadoop/emp/flume/1.log
#f2地址
a1.sources.r1.filegroups.f2=/home/hadoop/emp/flume/test/.*.log
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-logger.conf \
-Dflume.root.logger=info,console
2.输入文件到hdfs(sink hdfs)
1.文件内容
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/emp/flume/1.log
a1.channels.c1.type = memory
#sink的类型为hdfs
a1.sinks.k1.type = hdfs
#hdfs地址
a1.sinks.k1.hdfs.path=hdfs://bigdata13:9000/flume/log/
#输出文件格式为数据流 (不设置会是乱码)
a1.sinks.k1.hdfs.fileType=DataStream
#输出文件格式
a1.sinks.k1.hdfs.writeFormat=Text
#文件前缀
a1.sinks.k1.hdfs.filePrefix=events
#文件后缀
a1.sinks.k1.hdfs.fileSuffix=.log
#使用本机时间 可能出现本机时间不对
a1.sinks.k1.hdfs.useLocalTimeStamp=true
#文件滚动
#每60s采集到文件一次
a1.sinks.k1.hdfs.rollInterval=60
#没128G采集到文件一次
a1.sinks.k1.hdfs.rollSize=134217728
#每1000行采集到文件一次
a1.sinks.k1.hdfs.rollCount=1000
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2.解决小文件
1.hdfs.batchSize 不用
2.两大类(可能有用)
hdfs.round =》是否开启文件滚动
1.按照条数文件发生滚动
hdfs.rollSize
2.按照时间 文件发生滚动
hdfs.roundUnit => 时间滚动单元 second,minute or hour
hdfs.roundValue => 时间具体值
3.有用
hdfs.rollInterval =》 按照时间滚动(秒)
hdfs.rollSize => 按照文件大小 (134,217,728 =》 128G)
hdfs.rollCount => 按照hdfs文件数据条数 (条)
4.文件内容
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/emp/flume/1.log
a1.channels.c1.type = memory
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata13:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
#了解
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.round=true
a1.sinks.k1.hdfs.roundUnit=minute
a1.sinks.k1.hdfs.roundValue=1
#文件滚动
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=10
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume:
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs-round.conf \
-Dflume.root.logger=info,console
3.输入文件到hive
1.hive 普通表
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
#当地文件
a1.sources.r1.filegroups.f1=/home/hadoop/emp/1.txt
a1.channels.c1.type = memory
#sink类型为 hdfs
a1.sinks.k1.type = hdfs
#hive路径
a1.sinks.k1.hdfs.path = hdfs://bigdata13:9000/user/hive/warehouse/bigdata_hive.db/emp
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume:
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/hive/taildir-mem--hdfs-emp.conf \
-Dflume.root.logger=info,console
2.hive 分区表
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/000000_0
a1.channels.c1.type = memory
a1.sinks.k1.type = hdfs
# hive路径+分区字段
a1.sinks.k1.hdfs.path = hdfs://bigdata13:9000/user/hive/warehouse/bigdata_hive.db/emp_p/deptno=10
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume:
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/hive/taildir-mem-hdfs-emp_p.conf \
-Dflume.root.logger=info,console
3.hive sink
1.emp.txt
2.hive emp 普通表
souce:taildir
channel:mem
sink:hivesink
3.文件内容
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/emp.txt
a1.channels.c1.type = memory
a1.sinks.k1.type = hive
a1.sinks.k1.hive.metastore= => 需要hive 启动metastore 服务
a1.sinks.k1.hive.database=bigdata_hive
a1.sinks.k1.hive.table=emp
a1.sinks.k1.serializer=DELIMITED ==>指定表中字段分割符
a1.sinks.k1.serializer.delimiter=','
a1.sinks.k1.serializer.fieldnames=empno,ename,job,mgr,hiredate,sal,comm,deptno
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
---------------
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/project/flume/hive/bucket_00000
a1.channels.c1.type = memory
a1.channels.c1.transactionCapacity=15000
a1.sinks.k1.type = hive
a1.sinks.k1.hive.metastore= thrift://127.0.0.1:9083
a1.sinks.k1.hive.database=bigdata_hive
a1.sinks.k1.hive.table=emp
a1.sinks.k1.serializer=DELIMITED
a1.sinks.k1.serializer.delimiter=','
a1.sinks.k1.serializer.fieldnames=empno,ename,job,mgr,hiredate,sal,comm,deptno
a1.sinks.k1.batchSize=100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
报错:1.设置channels为15000 或者sinks为100(默认15000)
让channels >= sinks
2.flume lib目录下添加hive-hcatalog-streaming-3.1.3.jar
启动:
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/hive/taildir-mem-hive-emp.conf \
-Dflume.root.logger=info,console
4.hive 普通表+table 开启事务【Acid】
1.差别:
1.source emp.txt =>行式存储
2.table hive acid orc 列式存储
加入数据:insert into table table_name select * from emp.txt
2.sink:
hdfs
hive => hdfs
logger (控制台)
avro +》序列化
3.通常不需要 双层flume
4. log => flume => hdfs
=> 实时计算
=》kafka =》实时计算
4.压缩和file
source: exec taildir
channle :mem file
sink: hdfs => bzip2
agent:
agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
agent1.sources.r1.type = TAILDIR
agent1.sources.r1.filegroups = f1
agent1.sources.r1.filegroups.f1=/home/hadoop/tmpcodec01.log
#channles的类型为file
agent1.channels.c1.type = file
#存放检查点的目录
agent1.channels.c1.checkpointDir = /home/hadoopproject/flume//codec
agent1.channels.c1.dataDirs = /home/hadoop/projectflume/data/codec
agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = hdfs://bigdata13:9000flume/bzip2/
#sink为压缩格式
agent1.sinks.k1.hdfs.fileType=CompressedStream
agent1.sinks.k1.hdfs.writeFormat=Text
#压缩格式为bzip2
agent1.sinks.k1.hdfs.codeC=bzip2
#文件前后缀
agent1.sinks.k1.hdfs.filePrefix=events
agent1.sinks.k1.hdfs.fileSuffix=.bz2
#文件滚动
agent1.sinks.k1.hdfs.rollInterval=60
agent1.sinks.k1.hdfs.rollSize=134217728
agent1.sinks.k1.hdfs.rollCount=100
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
启动:
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/projectflume/ taildir-file-hdfs-bzip2.conf \
-Dflume.root.logger=info,console
5.avro:第一个agent的 sink 作为 第二个 agent的source
要求: 读取1111端口数据 数据发送到2222端口 最终2222端口 把数据写入hdfs
agent:
nc-mem-avro (开启端口)
avro-mem-hdfs (将1111的数据传入2222)
avro-mem-logger (将2222的数据打印到控制台)
agent1:telnet localhost 1111
agent2: nc-mem-avro.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1111
a1.channels.c1.type = memory
#sink的类型为avro
a1.sinks.k1.type = avro
a1.sinks.k1.hostname=bigdata13
a1.sinks.k1.port=2222
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动:flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/avro/nc-mem-avro.conf \
-Dflume.root.logger=info,console
agent3: avro-mem-logger.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.bind = bigdata13
a1.sources.r1.port = 2222
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动:
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/avro/avro-mem-logger.conf \
-Dflume.root.logger=info,console
启动顺序:agent3 ->agent2 ->agent1
6.sink Processors 负载 均衡
1.负载:
agent1:
agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1
agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata13
agent1.sources.r1.port = 1111
agent1.channels.c1.type = memory
#定义sink 2222
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata13
agent1.sinks.k1.port = 2222
#定义sink 3333
agent1.sinks.k2.type = avro
agent1.sinks.k2. hostname = bigdata13
agent1.sinks.k2.port = 3333
#定义sink processers
agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
#优先级高的故障了,转到优先级低的 failover
agent1.sinkgroups.g1.processor.type = failover
#优先级 绝对值越大 优先级越高
agent1.sinkgroups.g1.processor.priority.k1 = 5
agent1.sinkgroups.g1.processor.priority.k2 = 10
#时间毫秒值(2000=2s)
agent1.sinkgroups.g1.processor.maxpenalty = 2000
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c1
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent1_failover.conf \
-Dflume.root.logger=info,console
2.均衡(load_blance):
1.将数据分开 提供并行度的功能 减轻sink 压力
2.如果 第二个或者第三个 agent挂掉 数据都会发送到 没挂的sink 对应的agent上面
案例:将读取1111端口数据 数据发送到 2222端口和3333端口 最终数据输出到 控制台
3个agent:
agent1:
source:netcat
channel:mem
sink:avro 俩sink 2222 3333
agent2:2222
source:avro 2222
channel:mem
sink:logger
agent3: 3333
source:avro 3333
channel:mem
sink logger
代码:
agent1:
agent1.sources = r1
#sink俩端口
agent1.sinks = k1 k2
agent1.channels = c1
agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata13
agent1.sources.r1.port = 1111
agent1.channels.c1.type = memory
#定义sink 2222
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata13
agent1.sinks.k1.port = 2222
#定义sink 3333
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata13
agent1.sinks.k2.port = 3333
#定义sink processers
agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
#load_balence 均衡
agent1.sinkgroups.g1.processor.type = load_balance
#
agent1.sinkgroups.g1.processor.backoff = true
#round_robin 轮转 random 随机
agent1.sinkgroups.g1.processor.selector = round_robin
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c1
agent2:2222端口
agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1
agent2.sources.r1.type = avro
agent2.sources.r1.bind = bigdata13
agent2.sources.r1.port = 2222
agent2.channels.c1.type = memory
agent2.sinks.k1.type = logger
agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1
agent3:3333端口
agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1
agent3.sources.r1.type = avro
agent3.sources.r1.bind = bigdata13
agent3.sources.r1.port = 3333
agent3.channels.c1.type = memory
agent3.sinks.k1.type = logger
agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1
启动:
启动agent3:
flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent3.conf \
-Dflume.root.logger=info,console
启动agent2:
flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent2.conf \
-Dflume.root.logger=info,console
agent1:
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent1.conf \
-Dflume.root.logger=info,console
启动端口:telnet bigdata13 1111
3.Default Sink
7.数据分配 channel selectors
需求:
定一个agent 端口1111采集数据 一个发送到 hdfs
另外一个 发送到 logger
1.三个agent完成 上面的事情:
agent1: 1111接收数据 发送 2222 和3333端口
agent2: 接收2222 数据发送到 logger
agent3: 接收3333 数据发送到 logger
agent1:
agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1 c2
agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata13
agent1.sources.r1.port = 1111
#0 配置source channle
agent1.sources.r1.selector.type = replicating
agent1.sources.r1.channels = c1 c2
#1.配置两个channel
agent1.channels.c1.type = memory
agent1.channels.c2.type = memory
#定义sink hdfs
#定义sink 2222
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata13
agent1.sinks.k1.port = 2222
#定义sink 3333
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata13
agent1.sinks.k2.port = 3333
#定义 连接
agent1.sources.r1.channels = c1 c2
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c2
启动agent:
flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent3.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent2.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent1.conf \
-Dflume.root.logger=info,console
telnet bigdata13 1111
8.数据清洗 interceptors
1.多种日志采集到一个agent里面 之后 通过这个agent进行指定数据分发
agent1:
agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata13
#agent1输入端口
agent1.sources.r1.port = 1111
#添加一个拦截器 =》 数据清洗 + event打标签 li
agent1.sources.r1.interceptors = i1
#静态
agent1.sources.r1.interceptors.i1.type = static
#key
agent1.sources.r1.interceptors.i1.key = dl2262
#value=boy
agent1.sources.r1.interceptors.i1.value = boy
#0 配置source channle
agent1.sources.r1.channels = c1
#1.配置两个channel
agent1.channels.c1.type = memory
#定义sink 2222
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata13
agent1.sinks.k1.port = 2222
#定义 连接
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent2:
agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1
agent2.sources.r1.type = netcat
agent2.sources.r1.bind = bigdata13
agent2.sources.r1.port = 1112
#添加一个拦截器 =》 数据清洗 + event打标签 li
agent2.sources.r1.interceptors = i1
agent2.sources.r1.interceptors.i1.type = static
agent2.sources.r1.interceptors.i1.key = dl2262
agent2.sources.r1.interceptors.i1.value = girl
#0 配置source channle
agent2.sources.r1.channels = c1
#1.配置两个channel
agent2.channels.c1.type = memory
#定义sink 2222
agent2.sinks.k1.type = avro
agent2.sinks.k1.hostname = bigdata13
agent2.sinks.k1.port = 2222
#定义 连接
agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1
agent3:
agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1
agent3.sources.r1.type = netcat
agent3.sources.r1.bind = bigdata13
agent3.sources.r1.port = 1113
#添加一个拦截器 =》 数据清洗 + event打标签
agent3.sources.r1.interceptors = i1
agent3.sources.r1.interceptors.i1.type = static
agent3.sources.r1.interceptors.i1.key = dl2262
agent3.sources.r1.interceptors.i1.value = tea
#0 配置source channle
agent3.sources.r1.channels = c1
#1.配置两个channel
agent3.channels.c1.type = memory
#定义sink 2222
agent3.sinks.k1.type = avro
agent3.sinks.k1.hostname = bigdata13
agent3.sinks.k1.port = 2222
#定义 连接
agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1
agent4:
agent4.sources = r1
agent4.sinks = k1 k2 k3
agent4.channels = c1 c2 c3
agent4.sources.r1.type = avro
agent4.sources.r1.bind = bigdata13
agent4.sources.r1.port = 2222
#0 配置source channle
#multiplexing 根据value过滤
agent4.sources.r1.selector.type = multiplexing
agent4.sources.r1.selector.header = dl2262
agent4.sources.r1.selector.mapping.boy = c1
agent4.sources.r1.selector.mapping.girl = c2
agent4.sources.r1.selector.default = c3
agent4.sources.r1.channels = c1 c2 c3
#1.配置两个channel
agent4.channels.c1.type = memory
agent4.channels.c2.type = memory
agent4.channels.c3.type = memory
#定义sink logger
agent4.sinks.k1.type =logger
agent4.sinks.k2.type =logger
agent4.sinks.k3.type =logger
#定义 连接
agent4.sources.r1.channels = c1 c2 c3
agent4.sinks.k1.channel = c1
agent4.sinks.k2.channel = c2
agent4.sinks.k3.channel = c3
启动:
flume-ng agent \
--name agent4 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent4.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent3.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent2.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent1.conf \
-Dflume.root.logger=info,console
telnet bigdata13 1111
telnet bigdata13 1112
telnet bigdata13 1113
9.channle:
默认容量 >= 事务容量
1,默认容量
capacity 100
2.事务容量
transactionCapacity 100
souce =》channle
channle => sink
10.监控:
1.手段
1.flume 提供的ganglia 框架 指标【需要安装ganglia +】
2.通过 agent 启动 配置一些参数 http方式获取【推荐】
json数据 =》http接口数据 =》
1.前端人员 可视化界面展示
2。采集http接口数据=》musql可视化
2.