Flume--source,channel,sink配置

最新推荐文章于 2021-12-09 20:34:15 发布

Keep hunger

最新推荐文章于 2021-12-09 20:34:15 发布

阅读量317

点赞数 2

分类专栏： Flume 文章标签： Flume hadoop

本文链接：https://blog.csdn.net/ITgagaga/article/details/102888550

版权

Flume 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Flume–source,channel,sink配置

一：source

1）exec source

实时收集某个文件信息,数据源来源于一个linux命令的结果，用于收集文件的数据

常用：

cat 收集整个文件的内容
tail -f 监听文件内容，收集更新的数据，实时

# 给当前的agent  source channel  sink起别名  a1代表当前agent的名字
# source的别名
a1.sources = r1
# channel的别名
a1.channels = c1
# sink的别名
a1.sinks = k1

# 配置source的相关信息   数据源的  netcat一个主机的一个端口的数据  指定主机
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /home/hadoop/zookeeper.out


# 配置channel的相关信息  内存
a1.channels.c1.type = memory

# 配置sink的信息  控制台打印
a1.sinks.k1.type = logger

# 绑定source  channel   sink的对应关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动：

./flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/conf/test.conf --name a1 -Dflume.root.logger=INFO,console

2）Spooling Directory Souce

收集文件夹下所有文件的数据

一般需要指定如下信息：

a1.sources.r1.type - spooldir
a1.sources.r1.spoolDir - 指定需要收集的数据的文件夹

# 给当前的agent  source channel  sink起别名  a1代>
表当前agent的名字
# source的别名
a1.sources = r1
# channel的别名
a1.channels = c1
# sink的别名
a1.sinks = k1

# 配置source的相关信息   数据源的  netcat一个主机>
的一个端口的数据  指定主机
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir=/home/hadoop/flume_data


# 配置channel的相关信息  内存
a1.channels.c1.type = memory

# 配置sink的信息  控制台打印
a1.sinks.k1.type = logger

# 绑定source  channel   sink的对应关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动：

./flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/conf/spooldir.conf --name a1 -Dflume.root.logger=INFO,console

注意：

为了能够识别哪一个文件被采集过了：被采集过数据的文件名 .COMPLETED后缀，标识采集完成

3）avro source

数据源来源于avro协议指定主机的端口
agent和agent之间的通信通过avro协议的

type – avro
bind – 主机
port – 端口号

4）netcat source

tcp协议的指定主机的端口
type – netcat（tcp）| netcatudp（udp）
bind – 主机名或ip
port – 指定绑定的端口的

5）kafka source

二：channel

1）memory channel

内存为缓冲区

一般需要指定如下属性：

a1.channels.c1.type - memory
a1.channels.c1.capacity 1000 内存容量最大的缓冲events的数量
a1.channels.c1.transactionCapacity 100 每次提交或接受的数据量

2）kafka channel

三：sink

1）HDFS sink

将采集的数据放在hdfs上

配置文件：

# 给当前的agent  source channel  sink起别名  a1代表当前agent的名字
# source的别名
a1.sources = r1
# channel的别名
a1.channels = c1
# sink的别名
a1.sinks = k1

# 配置source的相关信息   exec代表执行linux命令
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /home/hadoop/zookeeper.out


# 配置channel的相关信息  内存
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =100


# 配置sink的信息  控制台打印
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /user/flume_data/20191103
a1.sinks.k1.hdfs.filePrefix = event-
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.rollSize = 516
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollCount = 0 
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text


# 绑定source  channel   sink的对应关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动

./flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/conf/sink.conf --name a1 -Dflume.root.logger=INFO,console

参数：

1.文件重命名

a1.sinks.k1.hdfs.filePrefix FlumeData 给文件添加前缀的
a1.sinks.k1.hdfs.fileSuffix – 给文件添加后缀的

2.hdfs的文件回滚条件

1）时间间隔

a1.sinks.k1.hdfs.rollInterval	默认30	设置为0 代表这个参数失效

2）文件大小间隔

a1.sinks.k1.hdfs.rollSize	默认1024	1kb回滚一个0代表这个参数失效

3）数据条数

a1.sinks.k1.hdfs.rollCount	默认10	10条数据回滚一次0代表这个参数失效

一般指定一个即可，如果3个都指定，只要有一个生效即回滚

3.hdfs的写出的文件的格式

1）hdfs.fileType SequenceFile DataStream 数据流
2）hdfs.writeFormat Writable|Text文本格式

2）logger sink

控制台输出，很少使用，一般用来测试中使用

3）avro | netcat sink

一般是配合串联的agent的source使用，用于event在agent之间的传递

案例：两个agent串联

规划如下：

hadoop01 a1 
    source exec 
    channel  memory
    sink  avro 
        a1.sinks.k1.type	–	avro.
        a1.sinks.k1.hostname	–	主机或ip.
        a1.sinks.k1.port	–	端口号.
hadoop02  a1 
    source  avro 
        a1.sources.r1.type	–	 avro
        a1.sources.r1.bind	–	主机名或ip
        a1.sources.r1.port	–	端口号
    channel  memory
    sink  logger

配置文件：

hadoop01:

# 给当前的agent  source channel  sink起别名  a1代表当前agent的名字
# source的别名
a1.sources = r1
# channel的别名
a1.channels = c1
# sink的别名
a1.sinks = k1

# 配置source的相关信息   数据源的  netcat一个主机的一个端口的数据  指定主机端口
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /home/hadoop/zookeeper.out


# 配置channel的相关信息  内存
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =100


# 配置sink的信息 
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop02
a1.sinks.k1.port = 45551


# 绑定source  channel   sink的对应关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

hadoop02:

# 给当前的agent  source channel  sink起别名  a1代表当前agent的名字
# source的别名
a1.sources = r1
# channel的别名
a1.channels = c1
# sink的别名
a1.sinks = k1

# 配置source的相关信息   数据源的  netcat一个主机的一个端口的数据  指定主机 端
a1.sources.r1.type = avro
# 这里的主机  和avrosink 一致
a1.sources.r1.bind = hadoop02
a1.sources.r1.port = 45551



# 配置channel的相关信息  内存
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =100


# 配置sink的信息  控制台打印
a1.sinks.k1.type = logger


# 绑定source  channel   sink的对应关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动：从后向前启动

hadoop02
./flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/conf/avro_source.conf --name a1 -Dflume.root.logger=INFO,console

hadoop01 
./flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/conf/avro_sink.conf --name a1 -Dflume.root.logger=INFO,console

结果：

在这里插入图片描述

Keep hunger

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flume--source,channel,sink配置

Flume–source,channel,sink配置文章目录Flume--source,channel,sink配置一：source1）exec source2）Spooling Directory Souce3）avro source4）netcat source5）kafka source二：channel1）memory channel2）kafka channel三：sink1）HDF...
复制链接

扫一扫