flume笔记

flume

官网:flume.apache.org
收集 日志数据
A => B
数据采集:把数据采集到服务器上
数据收集:把数据移动到指定位置
数据处理:
1.离线处理:批处理
数据已经放在那 =》
2.实时处理:
产生一条数据 处理一次
组件:
collecting 采集/收集 source 采集数据
aggregating 聚合 channel 存储采集过来的数据
moving 移动 sink 把采集来的数据发送出去
streaming data flows flume采集数据 实时采集数据
source:
avro 序列化框架 source ****
exec 日志文件 **
spooling dir 日志文件 **
taildir Source 日志文件 ****
Kafka Source **
NetCat TCP port采集数据 **
Custom 用户自己开发 *
channel :
Memory ****
File ****
JDBC *
Kafka *
Custom 用户自己开发 *
sink:
HDFS Sink ****
Hive Sink ****
Logger Sink 打印控制台 **
Avro Sink ****
HBaseSinks *
Kafka Sink ***
Custom Sink 用户自己开发

需求

event:一条数据
headers:描述信息
body:存的是实实在在的数据
1.通过flume从指定端口中 获取数据 输出到控制台?source:netcat
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

agent:
source :  NetCat TCP
channel : Memory
sink: Logger
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/nc-mem-logger.conf \
-Dflume.root.logger=info,console

telnet ip port
nc -lk ip port 向那个端口添加数据

2.采集日志文件
2.1采集日志数据到控制台打印source :exec
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/1.log

agent:
source :exec
channel : mem
sink:logger
exec :linux 命令进行采集数据,只能使用tail -F 1.log ****
exec存在的问题:采集过的日志内容 flume挂掉了 重启 会导致 数据重复采集

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/exec-mem-logger.conf \
-Dflume.root.logger=info,console

2.2.spooldir :采集某个文件夹下面的文件(生产上 不能用)
source:spooldir
channel : mem
sink:logger
agent :

a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/data/flume_data/
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/spooldir-mem-logger.conf \
-Dflume.root.logger=info,console

spool 采集某个目录下,文件名 不能重复,如果文件名重复,不会继续采集日志,flume 挂掉
2.3采集指定文件以及文件夹的内容 到控制台souce:tairdir
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/user_click.txt
a1.sources.r1.filegroups.f2=/home/hadoop/data/flume_data/.
.log
*
souce:tairdir
channel : mem
sink:logger
taildir: *****
采集某个文件夹下面的文件
采集某一个文件
”断点续传“
agent:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/user_click.txt
a1.sources.r1.filegroups.f2=/home/hadoop/data/flume_data/.*.log

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-logger.conf \
-Dflume.root.logger=info,console

3.采集日志文件 输出hdfs sink:hdfs
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

agent:
source :exec taildir
channel:mem
sink:hdfs
agent:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs.conf \
-Dflume.root.logger=info,console

3.1为什么flume 采集数据 到hdfs ? 文件查看不了?
flume配置有关:hdfs=》文件存储格式 不是text
需要修改:1.hdfs.fileType =》 DataStream
2.hdfs.writeFormat =》 Text

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.2控制将小文件变成大文件 =》文件滚动

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs-round.conf \
-Dflume.root.logger=info,console

3.3修改文件前后缀

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/2.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

useLocalTimeStamp指定数据落盘,依照的时间是本地机器的时间,而不是 数据本身的时间

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs-name.conf \
-Dflume.root.logger=info,console

数据延迟的问题。什么是数据延迟?先产生的数据后到,后产生的数据先到,后续进行业务统计分析 就不准确,flume本身采集数据 会有数据延迟问题。
解决延迟数据问题?
1.udf函数 hive
正确的数据重新落盘到正确的分区 【数据清洗】
2.flume 源头
log =》 flume =》 hive=》hdfs路径
a1.sinks.k1.hdfs.useLocalTimeStamp=true =》
指定数据落盘
依照的时间是本地机器的时间
而不是 数据本身的时间
event =》 hdfs 分区下面
1.useLocalTimeStamp =》 解决不了数据延迟
2.header 添加数据本身的时间 => 保证 正确数据落到正确分区
二次开发 flume
4.将日志数据导入到hive中
4.1将日志文件导入到hive普通表
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

hadoop fs -rm -r /user/hive/warehouse/bigdata_hive.db/emp/emp.txt

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/emp.txt

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/hive/taildir-mem--hdfs-emp.conf \
-Dflume.root.logger=info,console

4.2将日志数据导入到hive分区表
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp_p/deptno=10
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/000000_0

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp_p/deptno=10
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/hive/taildir-mem-hdfs-emp_p.conf \
-Dflume.root.logger=info,console

删除分区表:alter table xxx drop partition(deptno=10);
1.hdfs 有数据
2.元数据 没有
=》 之前如果hdfs数据和元数据不同时都存在时,hive table 是查不了数据的
hive 3.1.2版本可以
alter table emp_p add partition(deptno=10);元数据和hdfs上数据就相关联了
5.需求:读取1111端口数据 数据发送到2222端口 最终2222端口 把数据写入hdfs
sink:avro一般是跟在source:netcat后面
a1.sinks.k1.type = avro
a1.sinks.k1.hostname=bigdata12
a1.sinks.k1.port=2222

source:avro
a1.sources.r1.type = avro
a1.sources.r1.bind = bigdata12
a1.sources.r1.port = 2222

telnet bigdata12 1111一般用在source:netcat用来向某个端口发送信息
agent:
nc-mem-avro
avro-mem-hdfs
avro-mem-logger

agent1:
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = bigdata12
a1.sources.r1.port = 1111

a1.channels.c1.type = memory

a1.sinks.k1.type = avro
a1.sinks.k1.hostname=bigdata12
a1.sinks.k1.port=2222

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
agent2:avro-mem-hdfs
		avro-mem-logger

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.bind = bigdata12
a1.sources.r1.port = 2222

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/avro/avro-mem-logger.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/avro/nc-mem-avro.conf \
-Dflume.root.logger=info,console
telnet bigdata12 1111

6.需求:读取日志信息,将数据上传到hdfs上并压缩
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.bzip2
a1.sinks.k1.hdfs.fileType=CompressedStream
a1.sinks.k1.hdfs.codeC=bzip2
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=90
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/compress.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.bzip2
a1.sinks.k1.hdfs.fileType=CompressedStream
a1.sinks.k1.hdfs.codeC=bzip2
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=90
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs-compress.conf \
-Dflume.root.logger=info,console

7.需求:读取日志信息,用channel=file的形式,将数据上传到hdfs上并压缩
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/hadoop/project/flume/checkpoint/codeC
a1.channels.c1.dataDirs = /home/hadoop/project/flume/data/codeC

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/compress.log

a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/hadoop/project/flume/checkpoint/codeC
a1.channels.c1.dataDirs = /home/hadoop/project/flume/data/codeC

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log01/
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.bzip2
a1.sinks.k1.hdfs.fileType=CompressedStream
a1.sinks.k1.hdfs.codeC=bzip2
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=90
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-file-hdfs-compress.conf \
-Dflume.root.logger=info,console

均衡:load_balance

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = load_balance
agent1.sinkgroups.g1.processor.backoff = true
agent1.sinkgroups.g1.processor.selector = round_robin/random
agent1.sinkgroups.g1.processor.selector.maxTimeOut=2000

需求:读取1111端口数据 数据发送到 2222端口和3333端口 最终数据输出到 控制台
3个agent :
agent1:
source:nc
channel :mem
sink : avro 两个sink
agent2:2222端口
source:avro
channel :mem
sink : logger
agent3:3333端口
source:avro
channel :mem
sink : logger
均衡:load_balance
1.将数据分开 提供并行度的功能 减轻sink 压力
2.如果 第二个或者第三个 agent挂掉 数据都会发送到 没挂的sink 对应的agent上面
1.随机发送数据:random
2.轮循发送数据:round_robin

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.channels.c1.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata12
agent1.sinks.k2.port = 3333

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = load_balance
agent1.sinkgroups.g1.processor.backoff = true
agent1.sinkgroups.g1.processor.selector = round_robin/random
agent1.sinkgroups.g1.processor.selector.maxTimeOut=2000

agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c1
agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = avro
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 2222

agent2.channels.c1.type = memory

agent2.sinks.k1.type = logger

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1
agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = avro
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 3333

agent3.channels.c1.type = memory

agent3.sinks.k1.type = logger

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1

启动agent:从后往前 启动

flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent3.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent2.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent1.conf \
-Dflume.root.logger=info,console
telnet bigdata12 1111

负载 Failover

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 5
agent1.sinkgroups.g1.processor.priority.k2 = 10
agent1.sinkgroups.g1.processor.maxpenalty = 2000

容灾:sink 出现故障
负载 Failover
均衡 load_balance

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.channels.c1.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata12
agent1.sinks.k2.port = 3333

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 5
agent1.sinkgroups.g1.processor.priority.k2 = 10
agent1.sinkgroups.g1.processor.maxpenalty = 2000

agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c1
agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = avro
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 2222

agent2.channels.c1.type = memory

agent2.sinks.k1.type = logger

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1
agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = avro
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 3333

agent3.channels.c1.type = memory

agent3.sinks.k1.type = logger

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1
flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent3.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent2.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent1_failover.conf \
-Dflume.root.logger=info,console
telnet bigdata12 1111

flume核心组件

sources
interceptors 拦截器 => 主要处理 采集的数据 数据转换/数据清洗的操作
channel selectors =》 采集的数据 发送到哪个 channle
agent1.sources.r1.selector.type = replicating 两个channel数据同步

channels
sinks
sink processers =》 采集的数据 发送到 哪个sink
需求:定一个agent 端口1111采集数据 ,一个发送到 hdfs,另外一个 发送到 logger

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1 c2

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.sources.r1.selector.type = replicating
agent1.channels.c1.type = memory
agent1.channels.c2.type = memory

agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/flume/channel_selector/
agent1.sinks.k1.hdfs.fileType=DataStream
agent1.sinks.k1.hdfs.writeFormat=Text
agent1.sinks.k1.hdfs.filePrefix=events
agent1.sinks.k1.hdfs.fileSuffix=.log
agent1.sinks.k1.hdfs.useLocalTimeStamp=true
agent1.sinks.k1.hdfs.rollInterval=60
agent1.sinks.k1.hdfs.rollSize=134217728
agent1.sinks.k1.hdfs.rollCount=1000

agent1.sinks.k2.type = logger

agent1.sources.r1.channels = c1 c2
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c2
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/channle/agent_logger_hdfs.conf \
-Dflume.root.logger=info,console
telnet bigdata12 1111

三个agent完成 上面的事情:
agent1: 1111接收数据 发送 2222 和3333端口
agent2: 接收2222 数据发送到 logger
agent3: 接收3333 数据发送到 logger

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1 c2

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.sources.r1.selector.type = replicating
agent1.channels.c1.type = memory
agent1.channels.c2.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata12
agent1.sinks.k2.port = 3333

agent1.sources.r1.channels = c1 c2
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c2
agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = avro
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 2222

agent2.channels.c1.type = memory

agent2.sinks.k1.type = logger

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1
agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = avro
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 3333

agent3.channels.c1.type = memory

agent3.sinks.k1.type = logger

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1
flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent3.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent2.conf \
-Dflume.root.logger=info,console
flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent1.conf \
-Dflume.root.logger=info,console
telnet bigdata12 1111

添加拦截器

agent1.sources.r1.interceptors = i1
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = dl2262
agent1.sources.r1.interceptors.i1.value = boy

agent4.sources.r1.selector.type = multiplexing
agent4.sources.r1.selector.header = dl2262
agent4.sources.r1.selector.mapping.boy = c1
agent4.sources.r1.selector.mapping.girl = c2
agent4.sources.r1.selector.default = c3

需求:多种日志采集到一个agent里面 之后 通过这个agent进行指定数据分发

agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.sources.r1.interceptors = i1
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = dl2262
agent1.sources.r1.interceptors.i1.value = boy

agent1.channels.c1.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222

agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = netcat
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 1112

agent2.sources.r1.interceptors = i1
agent2.sources.r1.interceptors.i1.type = static
agent2.sources.r1.interceptors.i1.key = dl2262
agent2.sources.r1.interceptors.i1.value = girl

agent2.channels.c1.type = memory

agent2.sinks.k1.type = avro
agent2.sinks.k1.hostname = bigdata12
agent2.sinks.k1.port = 2222

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1
agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = netcat
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 1113

agent3.sources.r1.interceptors = i1
agent3.sources.r1.interceptors.i1.type = static
agent3.sources.r1.interceptors.i1.key = dl2262
agent3.sources.r1.interceptors.i1.value = tea

agent3.channels.c1.type = memory

agent3.sinks.k1.type = avro
agent3.sinks.k1.hostname = bigdata12
agent3.sinks.k1.port = 2222

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1
agent4.sources = r1
agent4.sinks = k1 k2 k3
agent4.channels = c1 c2 c3

agent4.sources.r1.type = avro
agent4.sources.r1.bind = bigdata12
agent4.sources.r1.port = 2222

agent4.sources.r1.selector.type = multiplexing
agent4.sources.r1.selector.header = dl2262
agent4.sources.r1.selector.mapping.boy = c1
agent4.sources.r1.selector.mapping.girl = c2
agent4.sources.r1.selector.default = c3

agent4.channels.c1.type = memory
agent4.channels.c2.type = memory
agent4.channels.c3.type = memory

agent4.sinks.k1.type =logger
agent4.sinks.k2.type =logger
agent4.sinks.k3.type =logger

agent4.sources.r1.channels = c1 c2 c3
agent4.sinks.k1.channel = c1
agent4.sinks.k2.channel = c2
agent4.sinks.k3.channel = c3
flume-ng agent \
--name agent4 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent4.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent3.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent2.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent1.conf \
-Dflume.root.logger=info,console
telnet bigdata12 1111
telnet bigdata12 1112
telnet bigdata12 1113

监控channel json方式 通过http接口 对进行channle监控

启动时加上***-Dflume.monitoring.type=http
-Dflume.monitoring.port=9527
http://bigdata12:9527/metrics***
channle:
1.默认容量
capacity 100
2.事务容量
transactionCapacity 100
souce -》channle
channle =》 sink
手段:
1.flume 提供 ganglia 框架 指标 【需要安装ganglia + 】
2.通过 agent 启动 配置一些参数 http 方式获取 【建议用这个 easy】
json数据 =》 http接口数据 =》1.前端人员 可视化界面展示
2.采集 http接口数据 =》 mysql =》 可视化
参数解释:
SOURCE:
OpenConnectionCount(打开的连接数)
Type(组件类型)
AppendBatchAcceptedCount(追加到channel中的批数量)
AppendBatchReceivedCount(source端刚刚追加的批数量)
EventAcceptedCount(成功放入channel的event数量)
AppendReceivedCount(source追加目前收到的数量)
StartTime(组件开始时间)
StopTime(组件停止时间)
EventReceivedCount(source端成功收到的event数量)
AppendAcceptedCount(放入channel的event数量)
CHANNEL:
EventPutSuccessCount(成功放入channel的event数量)
ChannelFillPercentage(通道使用比例)
Type(组件类型)
EventPutAttemptCount(尝试放入将event放入channel的次数)
ChannelSize(目前在channel中的event数量)
StartTime(组件开始时间)
StopTime(组件停止时间)
EventTakeSuccessCount(从channel中成功取走的event数量)
ChannelCapacity(通道容量)
EventTakeAttemptCount(尝试从channel中取走event的次数)
SINK
BatchCompleteCount(完成的批数量)
ConnectionFailedCount(连接失败数)
EventDrainAttemptCount(尝试提交的event数量)
ConnectionCreatedCount(创建连接数)
Type(组件类型)
BatchEmptyCount(批量取空的数量)
ConnectionClosedCount(关闭连接数量)
EventDrainSuccessCount(成功发送event的数量)
StartTime(组件开始时间)
StopTime(组件停止时间)
BatchUnderflowCount(正处于批量处理的batch数)

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/dt01.log

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/monitor/agent.conf \
-Dflume.root.logger=info,console \
-Dflume.monitoring.type=http \
-Dflume.monitoring.port=9527
http://bigdata12:9527/metrics
{
	"CHANNEL.c1": {
		"ChannelCapacity": "100",
		"ChannelFillPercentage": "0.0",
		"Type": "CHANNEL",
		"EventTakeSuccessCount": "10000",
		"ChannelSize": "0",
		"EventTakeAttemptCount": "10007",
		"StartTime": "1671019000048",
		"EventPutSuccessCount": "10000",
		"EventPutAttemptCount": "10000",
		"StopTime": "0"
	},
	"SOURCE.r1": {
		"AppendBatchAcceptedCount": "100",
		"GenericProcessingFail": "0",
		"EventAcceptedCount": "10000",
		"AppendReceivedCount": "0",
		"StartTime": "1671019000147",
		"AppendBatchReceivedCount": "100",
		"ChannelWriteFail": "0",
		"EventReceivedCount": "10000",
		"EventReadFail": "0",
		"Type": "SOURCE",
		"AppendAcceptedCount": "0",
		"OpenConnectionCount": "0",
		"StopTime": "0"
	}
}

sink:file_roll
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/hadoop/tmp/0.txt

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/000000_0

a1.channels.c1.type = memory

a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/hadoop/tmp/0.txt

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-file_roll.conf \
-Dflume.root.logger=info,console
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值