flume笔记

qq_47721411

已于 2023-02-28 15:18:13 修改

阅读量160

点赞数

文章标签： flume 大数据 hadoop

于 2022-12-15 14:34:29 首次发布

本文链接：https://blog.csdn.net/qq_47721411/article/details/128293983

版权

flume

官网：flume.apache.org
收集日志数据
A => B
数据采集：把数据采集到服务器上
数据收集：把数据移动到指定位置
数据处理：
1.离线处理：批处理
数据已经放在那 =》
2.实时处理：
产生一条数据处理一次
组件：
collecting 采集/收集 source 采集数据
aggregating 聚合 channel 存储采集过来的数据
moving 移动 sink 把采集来的数据发送出去
streaming data flows flume采集数据实时采集数据
source：
avro 序列化框架 source ****
exec 日志文件 **
spooling dir 日志文件 **
taildir Source 日志文件 ****
Kafka Source **
NetCat TCP port采集数据 **
Custom 用户自己开发 *
channel ：
Memory ****
File ****
JDBC *
Kafka *
Custom 用户自己开发 *
sink：
HDFS Sink ****
Hive Sink ****
Logger Sink 打印控制台 **
Avro Sink ****
HBaseSinks *
Kafka Sink ***
Custom Sink 用户自己开发

需求

event：一条数据
headers：描述信息
body：存的是实实在在的数据
1.通过flume从指定端口中获取数据输出到控制台？source:netcat
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

agent：
source :  NetCat TCP
channel : Memory
sink: Logger

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume：

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/nc-mem-logger.conf \
-Dflume.root.logger=info,console

telnet ip port
nc -lk ip port 向那个端口添加数据

2.采集日志文件
2.1采集日志数据到控制台打印source :exec
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/1.log
agent：
source :exec
channel : mem
sink:logger
exec :linux 命令进行采集数据，只能使用tail -F 1.log ****
exec存在的问题：采集过的日志内容 flume挂掉了重启会导致数据重复采集

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/exec-mem-logger.conf \
-Dflume.root.logger=info,console

2.2.spooldir ：采集某个文件夹下面的文件(生产上不能用)
source：spooldir
channel : mem
sink:logger
agent ：

a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/data/flume_data/
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume：

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/spooldir-mem-logger.conf \
-Dflume.root.logger=info,console

spool 采集某个目录下，文件名不能重复，如果文件名重复，不会继续采集日志，flume 挂掉
2.3采集指定文件以及文件夹的内容到控制台souce：tairdir
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/user_click.txt
a1.sources.r1.filegroups.f2=/home/hadoop/data/flume_data/..log*
souce：tairdir
channel : mem
sink:logger
taildir: *****
采集某个文件夹下面的文件
采集某一个文件
”断点续传“
agent:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/user_click.txt
a1.sources.r1.filegroups.f2=/home/hadoop/data/flume_data/.*.log

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-logger.conf \
-Dflume.root.logger=info,console

3.采集日志文件输出hdfs sink：hdfs
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000
agent：
source ：exec taildir
channel：mem
sink：hdfs
agent:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume:

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs.conf \
-Dflume.root.logger=info,console

3.1为什么flume 采集数据到hdfs ？文件查看不了？
flume配置有关：hdfs=》文件存储格式不是text
需要修改：1.hdfs.fileType =》 DataStream
2.hdfs.writeFormat =》 Text

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.2控制将小文件变成大文件 =》文件滚动

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/1.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs-round.conf \
-Dflume.root.logger=info,console

3.3修改文件前后缀

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/2.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

useLocalTimeStamp指定数据落盘，依照的时间是本地机器的时间，而不是数据本身的时间

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs-name.conf \
-Dflume.root.logger=info,console

数据延迟的问题。什么是数据延迟？先产生的数据后到，后产生的数据先到，后续进行业务统计分析就不准确，flume本身采集数据会有数据延迟问题。
解决延迟数据问题？
1.udf函数 hive
正确的数据重新落盘到正确的分区【数据清洗】
2.flume 源头
log =》 flume =》 hive=》hdfs路径
a1.sinks.k1.hdfs.useLocalTimeStamp=true =》
指定数据落盘
依照的时间是本地机器的时间
而不是数据本身的时间
event =》 hdfs 分区下面
1.useLocalTimeStamp =》解决不了数据延迟
2.header 添加数据本身的时间 => 保证正确数据落到正确分区
二次开发 flume
4.将日志数据导入到hive中
4.1将日志文件导入到hive普通表
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100
hadoop fs -rm -r /user/hive/warehouse/bigdata_hive.db/emp/emp.txt

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/emp.txt

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/hive/taildir-mem--hdfs-emp.conf \
-Dflume.root.logger=info,console

4.2将日志数据导入到hive分区表
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp_p/deptno=10
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/000000_0

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/user/hive/warehouse/bigdata_hive.db/emp_p/deptno=10
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.log
a1.sinks.k1.hdfs.rollInterval=60
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/hive/taildir-mem-hdfs-emp_p.conf \
-Dflume.root.logger=info,console

删除分区表：alter table xxx drop partition(deptno=10);
1.hdfs 有数据
2.元数据没有
=》之前如果hdfs数据和元数据不同时都存在时，hive table 是查不了数据的
hive 3.1.2版本可以
alter table emp_p add partition(deptno=10);元数据和hdfs上数据就相关联了
5.需求：读取1111端口数据数据发送到2222端口最终2222端口把数据写入hdfs
sink:avro一般是跟在source:netcat后面
a1.sinks.k1.type = avro
a1.sinks.k1.hostname=bigdata12
a1.sinks.k1.port=2222
source:avro
a1.sources.r1.type = avro
a1.sources.r1.bind = bigdata12
a1.sources.r1.port = 2222
telnet bigdata12 1111一般用在source:netcat用来向某个端口发送信息
agent：
nc-mem-avro
avro-mem-hdfs
avro-mem-logger

agent1:
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = bigdata12
a1.sources.r1.port = 1111

a1.channels.c1.type = memory

a1.sinks.k1.type = avro
a1.sinks.k1.hostname=bigdata12
a1.sinks.k1.port=2222

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

agent2:avro-mem-hdfs
		avro-mem-logger

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.bind = bigdata12
a1.sources.r1.port = 2222

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/avro/avro-mem-logger.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/avro/nc-mem-avro.conf \
-Dflume.root.logger=info,console

telnet bigdata12 1111

6.需求：读取日志信息，将数据上传到hdfs上并压缩
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.bzip2
a1.sinks.k1.hdfs.fileType=CompressedStream
a1.sinks.k1.hdfs.codeC=bzip2
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=90
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/compress.log

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log/
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.bzip2
a1.sinks.k1.hdfs.fileType=CompressedStream
a1.sinks.k1.hdfs.codeC=bzip2
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=90
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-hdfs-compress.conf \
-Dflume.root.logger=info,console

7.需求：读取日志信息，用channel=file的形式，将数据上传到hdfs上并压缩
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/hadoop/project/flume/checkpoint/codeC
a1.channels.c1.dataDirs = /home/hadoop/project/flume/data/codeC

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/compress.log

a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/hadoop/project/flume/checkpoint/codeC
a1.channels.c1.dataDirs = /home/hadoop/project/flume/data/codeC

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://bigdata12:9000/flume/log01/
a1.sinks.k1.hdfs.filePrefix=events
a1.sinks.k1.hdfs.fileSuffix=.bzip2
a1.sinks.k1.hdfs.fileType=CompressedStream
a1.sinks.k1.hdfs.codeC=bzip2
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=90
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=1000

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-file-hdfs-compress.conf \
-Dflume.root.logger=info,console

均衡：load_balance

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = load_balance
agent1.sinkgroups.g1.processor.backoff = true
agent1.sinkgroups.g1.processor.selector = round_robin/random
agent1.sinkgroups.g1.processor.selector.maxTimeOut=2000
需求：读取1111端口数据数据发送到 2222端口和3333端口最终数据输出到控制台
3个agent ：
agent1：
source：nc
channel :mem
sink : avro 两个sink
agent2：2222端口
source：avro
channel :mem
sink : logger
agent3：3333端口
source：avro
channel :mem
sink : logger
均衡：load_balance
1.将数据分开提供并行度的功能减轻sink 压力
2.如果第二个或者第三个 agent挂掉数据都会发送到没挂的sink 对应的agent上面
1.随机发送数据：random
2.轮循发送数据：round_robin

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.channels.c1.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata12
agent1.sinks.k2.port = 3333

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = load_balance
agent1.sinkgroups.g1.processor.backoff = true
agent1.sinkgroups.g1.processor.selector = round_robin/random
agent1.sinkgroups.g1.processor.selector.maxTimeOut=2000

agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c1

agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = avro
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 2222

agent2.channels.c1.type = memory

agent2.sinks.k1.type = logger

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1

agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = avro
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 3333

agent3.channels.c1.type = memory

agent3.sinks.k1.type = logger

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1

启动agent：从后往前启动

flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent3.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent2.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent1.conf \
-Dflume.root.logger=info,console

telnet bigdata12 1111

负载 Failover

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 5
agent1.sinkgroups.g1.processor.priority.k2 = 10
agent1.sinkgroups.g1.processor.maxpenalty = 2000

容灾：sink 出现故障
负载 Failover
均衡 load_balance

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.channels.c1.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata12
agent1.sinks.k2.port = 3333

agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = k1 k2
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 5
agent1.sinkgroups.g1.processor.priority.k2 = 10
agent1.sinkgroups.g1.processor.maxpenalty = 2000

agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c1

agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = avro
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 2222

agent2.channels.c1.type = memory

agent2.sinks.k1.type = logger

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1

agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = avro
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 3333

agent3.channels.c1.type = memory

agent3.sinks.k1.type = logger

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1

flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent3.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent2.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/sink/agent1_failover.conf \
-Dflume.root.logger=info,console

telnet bigdata12 1111

flume核心组件

sources
interceptors 拦截器 => 主要处理采集的数据数据转换/数据清洗的操作
channel selectors =》采集的数据发送到哪个 channle
agent1.sources.r1.selector.type = replicating 两个channel数据同步

channels
sinks
sink processers =》采集的数据发送到哪个sink
需求：定一个agent 端口1111采集数据，一个发送到 hdfs，另外一个发送到 logger

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1 c2

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.sources.r1.selector.type = replicating
agent1.channels.c1.type = memory
agent1.channels.c2.type = memory

agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = hdfs://bigdata12:9000/flume/channel_selector/
agent1.sinks.k1.hdfs.fileType=DataStream
agent1.sinks.k1.hdfs.writeFormat=Text
agent1.sinks.k1.hdfs.filePrefix=events
agent1.sinks.k1.hdfs.fileSuffix=.log
agent1.sinks.k1.hdfs.useLocalTimeStamp=true
agent1.sinks.k1.hdfs.rollInterval=60
agent1.sinks.k1.hdfs.rollSize=134217728
agent1.sinks.k1.hdfs.rollCount=1000

agent1.sinks.k2.type = logger

agent1.sources.r1.channels = c1 c2
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c2

flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/channle/agent_logger_hdfs.conf \
-Dflume.root.logger=info,console

telnet bigdata12 1111

三个agent完成上面的事情：
agent1： 1111接收数据发送 2222 和3333端口
agent2: 接收2222 数据发送到 logger
agent3: 接收3333 数据发送到 logger

agent1.sources = r1
agent1.sinks = k1 k2
agent1.channels = c1 c2

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.sources.r1.selector.type = replicating
agent1.channels.c1.type = memory
agent1.channels.c2.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = bigdata12
agent1.sinks.k2.port = 3333

agent1.sources.r1.channels = c1 c2
agent1.sinks.k1.channel = c1
agent1.sinks.k2.channel = c2

agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = avro
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 2222

agent2.channels.c1.type = memory

agent2.sinks.k1.type = logger

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1

agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = avro
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 3333

agent3.channels.c1.type = memory

agent3.sinks.k1.type = logger

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1

flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent3.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent2.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/one2many/agent1.conf \
-Dflume.root.logger=info,console

telnet bigdata12 1111

添加拦截器

agent1.sources.r1.interceptors = i1
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = dl2262
agent1.sources.r1.interceptors.i1.value = boy
agent4.sources.r1.selector.type = multiplexing
agent4.sources.r1.selector.header = dl2262
agent4.sources.r1.selector.mapping.boy = c1
agent4.sources.r1.selector.mapping.girl = c2
agent4.sources.r1.selector.default = c3
需求：多种日志采集到一个agent里面之后通过这个agent进行指定数据分发

agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1

agent1.sources.r1.type = netcat
agent1.sources.r1.bind = bigdata12
agent1.sources.r1.port = 1111

agent1.sources.r1.interceptors = i1
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = dl2262
agent1.sources.r1.interceptors.i1.value = boy

agent1.channels.c1.type = memory

agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = bigdata12
agent1.sinks.k1.port = 2222

agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

agent2.sources = r1
agent2.sinks = k1
agent2.channels = c1

agent2.sources.r1.type = netcat
agent2.sources.r1.bind = bigdata12
agent2.sources.r1.port = 1112

agent2.sources.r1.interceptors = i1
agent2.sources.r1.interceptors.i1.type = static
agent2.sources.r1.interceptors.i1.key = dl2262
agent2.sources.r1.interceptors.i1.value = girl

agent2.channels.c1.type = memory

agent2.sinks.k1.type = avro
agent2.sinks.k1.hostname = bigdata12
agent2.sinks.k1.port = 2222

agent2.sources.r1.channels = c1
agent2.sinks.k1.channel = c1

agent3.sources = r1
agent3.sinks = k1
agent3.channels = c1

agent3.sources.r1.type = netcat
agent3.sources.r1.bind = bigdata12
agent3.sources.r1.port = 1113

agent3.sources.r1.interceptors = i1
agent3.sources.r1.interceptors.i1.type = static
agent3.sources.r1.interceptors.i1.key = dl2262
agent3.sources.r1.interceptors.i1.value = tea

agent3.channels.c1.type = memory

agent3.sinks.k1.type = avro
agent3.sinks.k1.hostname = bigdata12
agent3.sinks.k1.port = 2222

agent3.sources.r1.channels = c1
agent3.sinks.k1.channel = c1

agent4.sources = r1
agent4.sinks = k1 k2 k3
agent4.channels = c1 c2 c3

agent4.sources.r1.type = avro
agent4.sources.r1.bind = bigdata12
agent4.sources.r1.port = 2222

agent4.sources.r1.selector.type = multiplexing
agent4.sources.r1.selector.header = dl2262
agent4.sources.r1.selector.mapping.boy = c1
agent4.sources.r1.selector.mapping.girl = c2
agent4.sources.r1.selector.default = c3

agent4.channels.c1.type = memory
agent4.channels.c2.type = memory
agent4.channels.c3.type = memory

agent4.sinks.k1.type =logger
agent4.sinks.k2.type =logger
agent4.sinks.k3.type =logger

agent4.sources.r1.channels = c1 c2 c3
agent4.sinks.k1.channel = c1
agent4.sinks.k2.channel = c2
agent4.sinks.k3.channel = c3

flume-ng agent \
--name agent4 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent4.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent3.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent2.conf \
-Dflume.root.logger=info,console

flume-ng agent \
--name agent1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/many2one/agent1.conf \
-Dflume.root.logger=info,console

telnet bigdata12 1111
telnet bigdata12 1112
telnet bigdata12 1113

监控channel json方式通过http接口对进行channle监控

启动时加上***-Dflume.monitoring.type=http
-Dflume.monitoring.port=9527
http://bigdata12:9527/metrics***
channle:
1.默认容量
capacity 100
2.事务容量
transactionCapacity 100
souce -》channle
channle =》 sink
手段：
1.flume 提供 ganglia 框架指标【需要安装ganglia + 】
2.通过 agent 启动配置一些参数 http 方式获取【建议用这个 easy】
json数据 =》 http接口数据 =》1.前端人员可视化界面展示
2.采集 http接口数据 =》 mysql =》可视化
参数解释：
SOURCE：
OpenConnectionCount（打开的连接数）
Type（组件类型）
AppendBatchAcceptedCount（追加到channel中的批数量）
AppendBatchReceivedCount（source端刚刚追加的批数量）
EventAcceptedCount（成功放入channel的event数量）
AppendReceivedCount（source追加目前收到的数量）
StartTime（组件开始时间）
StopTime（组件停止时间）
EventReceivedCount（source端成功收到的event数量）
AppendAcceptedCount（放入channel的event数量）
CHANNEL：
EventPutSuccessCount（成功放入channel的event数量）
ChannelFillPercentage（通道使用比例）
Type（组件类型）
EventPutAttemptCount（尝试放入将event放入channel的次数）
ChannelSize（目前在channel中的event数量）
StartTime（组件开始时间）
StopTime（组件停止时间）
EventTakeSuccessCount（从channel中成功取走的event数量）
ChannelCapacity（通道容量）
EventTakeAttemptCount（尝试从channel中取走event的次数）
SINK
BatchCompleteCount(完成的批数量)
ConnectionFailedCount（连接失败数）
EventDrainAttemptCount（尝试提交的event数量）
ConnectionCreatedCount（创建连接数）
Type（组件类型）
BatchEmptyCount（批量取空的数量）
ConnectionClosedCount（关闭连接数量）
EventDrainSuccessCount（成功发送event的数量）
StartTime（组件开始时间）
StopTime（组件停止时间）
BatchUnderflowCount（正处于批量处理的batch数）

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/dt01.log

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/monitor/agent.conf \
-Dflume.root.logger=info,console \
-Dflume.monitoring.type=http \
-Dflume.monitoring.port=9527

http://bigdata12:9527/metrics

{
	"CHANNEL.c1": {
		"ChannelCapacity": "100",
		"ChannelFillPercentage": "0.0",
		"Type": "CHANNEL",
		"EventTakeSuccessCount": "10000",
		"ChannelSize": "0",
		"EventTakeAttemptCount": "10007",
		"StartTime": "1671019000048",
		"EventPutSuccessCount": "10000",
		"EventPutAttemptCount": "10000",
		"StopTime": "0"
	},
	"SOURCE.r1": {
		"AppendBatchAcceptedCount": "100",
		"GenericProcessingFail": "0",
		"EventAcceptedCount": "10000",
		"AppendReceivedCount": "0",
		"StartTime": "1671019000147",
		"AppendBatchReceivedCount": "100",
		"ChannelWriteFail": "0",
		"EventReceivedCount": "10000",
		"EventReadFail": "0",
		"Type": "SOURCE",
		"AppendAcceptedCount": "0",
		"OpenConnectionCount": "0",
		"StopTime": "0"
	}
}

sink：file_roll
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/hadoop/tmp/0.txt

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/home/hadoop/tmp/000000_0

a1.channels.c1.type = memory

a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/hadoop/tmp/0.txt

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/project/flume/taildir-mem-file_roll.conf \
-Dflume.root.logger=info,console