一、收集socket端数据,并显示到logger端
1.在linux中安装netcat工具,用户开启Socket客户端:
- 切换至root下 $>su root
- nc安装命令 $>yum install -y nc
- 切换至hyxy用户下,模拟聊天室: $>nc 开启服务器端:$>nc -l 55555 开启客户端 :$>nc localhost 55555
2.创建Agent
编写Agent配置信息,在{FLUME_HOME/conf}目录下,新建文件:ncAgent.conf,添加以下代码:
a1.sources = s1
a1.channels = c1
a1.sinks = k1
#定义agent的source属性
a1.sources.s1.type = netcat
a1.sources.s1.bind = master
a1.sources.s1.port = 55555
#配置agent的sink属性,落地到 console 上
a1.sinks.k1.type = logger
#配置agent的channel的属性,以内存为管道
a1.channels.c1.type = memory
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1 #注意:a1.sinks.k1.channel后不用有s
3.开启flume进程:
$>flume-ng agent --name a1 --conf /home/hyxy/soft/flume/conf/ --conf-file /home/hyxy/soft/flume/conf/ncAgent.conf -Dflume.root.logger=INFO,console
或者:
$>flume-ng agent -n a1 -c /home/hyxy/soft/flume/conf/ -f /home/hyxy/soft/flume/conf/ncAgent.conf -Dflume.root.logger=INFO,console
4.开启nc客户端:
$>nc master 55555
hello world
zhang san
5.结果:
在flume的会话窗口中,显示以下信息:
Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world }
Event: { headers:{} body: 7A 68 61 6E 67 20 73 61 6E zhang san }
-------------------------------------------------------------------------------------------------------------
File Channel:
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/hyxy/flumeAAA/checkpoint
a1.channels.c1.dataDirs = /home/hyxy/flumeAAA/data
flume的可恢复性 推荐使用FileChannel,事件持久化在本地文件系统里(性能较差)
二、收集socket端数据,并获取后写入到本地磁盘
netcat-fileroll.conf
1.编写agent
a1.sources=s1
a1.channels=c1
a1.sinks=k1
#定义agent的source属性
a1.sources.s1.type=netcat
a1.sources.s1.bind=master
a1.sources.s1.port=55555
#配置agent的sink属性
#/home/hyxy/tmp/flume 需要事先创建好此文件夹 【mkdir flume】
a1.sinks.k1.type=file_roll
a1.sinks.k1.sink.directory=/home/hyxy/tmp/flume
#每30秒滚动一次文件。指定0将禁用滚动并导致所有事件都写入单个文件
a1.sinks.k1.sink.rollInterval=0
#配置agent的channel的属性
a1.channels.c1.type=memory
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1
2.执行:
$>flume-ng agent --name a1 --conf /home/hyxy/apps/flume/conf/
--conf-file /home/hyxy/apps/flume/conf/netcat-fileroll.conf
-Dflume.root.logger=INFO,console
3.开启nc客户端4个会话
$>nc master 55555 hello word
$>nc master 55555 hello
$>nc master 55555 hello123
$>nc master 55555 hello1234
4.查看采集目录
$>cd /home/hyxy/tmp/flume
$>ll
-rw-rw-r--. 1 hyxy hyxy 0 Sep 10 14:03 1536613249150-7
-rw-rw-r--. 1 hyxy hyxy 0 Sep 10 14:04 1536613249150-8
-rw-rw-r--. 1 hyxy hyxy 0 Sep 10 14:04 1536613249150-9
-rw-rw-r--. 1 hyxy hyxy 0 Sep 10 14:06 1536613569603-1
[hyxy@master flume0804]$ cat 1564884105308-1
hello word
hello
hello123
hello1234
###经过测试 a1.sinks.k1.sink.rollInterval=10
每10秒自动生成文件,不管source是否传来数据,都会自动生成文件###
三、 一个agent 两套流(FLow)
1.实现要求:前提条件为一个Agent(代理)
a.监控44444端口,收集的数据写入到本地磁盘{/home/hyxy/tmp/flume} (本地磁盘要事先创建文件)
b.监控55555端口,收集的数据显示到logger
2.编写Agent:
在{FLUME_HOME/conf}下新建one-agent.conf文件
a1.sources=s1 s2
a1.channels=c1 c2
a1.sinks=k1 k2
#定义agent的source属性
a1.sources.s1.type=netcat
a1.sources.s1.bind=master
a1.sources.s1.port=44444
a1.sources.s2.type=netcat
a1.sources.s2.bind=master
a1.sources.s2.port=55555
#配置agent的sink属性
a1.sinks.k1.type=file_roll
a1.sinks.k1.sink.directory=/home/hyxy/tmp/flume
a1.sinks.k1.sink.rollInterval=0
a1.sinks.k2.type=logger
#配置agent的channel的属性
a1.channels.c1.type=memory
a1.channels.c2.type=memory
#source-->channel-->sink
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1
#source-->channel-->sink
a1.sources.s2.channels=c2
a1.sinks.k2.channel=c2
3.开启flume-ng
$>flume-ng agent --name a1 --conf /home/hyxy/soft/flume/conf/
--conf-file /home/hyxy/soft/flume/conf/one-agent.conf
-Dflume.root.logger=INFO,console
4.开启nc客户端
$>nc master 44444
$>nc master 55555
四、Exec源在启动时运行给定的Unix或Linux命令,实时监控文件的变化,并将新生 成的数据采集至HDFS
1、先运行命令监控文件
$> tail -F /home/hyxy/tmp/flume/helloworld
linux tail命令用途是依照要求将指定的文件的最后部分输出到标准设备,通常是终端。监视filename文件的尾部内容(默认10行,相当于增加参数 -n 10),刷新显示在屏幕上。
/home/hyxy/tmp/flume
[hyxy@master flume]$ echo "Aa11111a111" >> helloworld
[hyxy@master flume]$ echo "Aa11111a222" >> helloworld
source -->Exec源 channel-->内存 sink-->hdfs
------------------------------------------------------------------------------------------------
1.编写Agent;
在{FLUME_HOME/conf}下新建exec-hdfs.conf文件
a1.sources=s1
a1.channels=c1
a1.sinks=k1
#定义agent的source属性
a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /home/hyxy/tmp/flume/helloworld
#配置agent的sink属性
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://mycluster/flume/
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.filePrefix=helloworld
a1.sinks.k1.hdfs.fileSuffix=.hyxy
#配置agent的channel的属性
a1.channels.c1.type = memory
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
2.开启hdfs集群
$>zkServer.sh start
$>start-dfs.sh
3.开启flume-ng
$>flume-ng agent --name a1 --conf /home/hyxy/soft/flume/conf/
--conf-file /home/hyxy/soft/flume/conf/exec-hdfs.conf
-Dflume.root.logger=INFO,console
hdfs.BucketWriter: Creating hdfs://mycluster/flume//helloworld.1564886221876.hyxy.tmp
查看hdfs下flume路径由flume创建完成
[hyxy@master soft]$ hadoop fs -lsr /flume
lsr: DEPRECATED: Please use 'ls -R' instead.
lsr: `/flume': No such file or directory
[hyxy@master soft]$ hadoop fs -lsr /flume
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r-- 3 hyxy supergroup /flume/helloworld.1564886221876.hyxy.tmp
4.采集数据查看HDFS数据
/home/hyxy/tmp/flume
[hyxy@master flume]$ echo "Aa11111a111" >> helloworld
[hyxy@master flume]$ echo "Aa11111a222" >> helloworld
[hyxy@master flume]$ echo "Aa11111333" >> helloworld
[hyxy@master flume]$ echo "Aa11111333" >> helloworld
[hyxy@master flume]$ echo "Aa11111333" >> helloworld
[hyxy@master flume]$ echo "Aa11111333" >> helloworld
[hyxy@master flume]$ echo "Aa11111333" >> helloworld
查看webUI:50070
[hyxy@master Desktop]$ hadoop fs -lsr /flume
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r-- 3 hyxy supergroup 179 2018-06-05 10:36 /flume/helloworld.1559701633159.hyxy
-rw-r--r-- 3 hyxy supergroup 179 2018-06-05 10:37 /flume/helloworld.1559702240134.hyxy
-rw-r--r-- 3 hyxy supergroup 187 2018-06-05 10:38 /flume/helloworld.1559702240135.hyxy
-rw-r--r-- 3 hyxy supergroup 185 2018-06-05 10:38 /flume/helloworld.1559702240136.hyxy
-rw-r--r-- 3 hyxy supergroup 125 2018-06-05 10:38 /flume/helloworld.1559702240137.hyxy.tmp
修改以上案例,5个event滚动新生成文件
hdfs.rollCount 10(default):滚前写入文件的事件数(0 =从不基于事件数滚)
a1.sinks.k1.hdfs.rollCount = 5
echo五次, 滚动生成新目录
[hyxy@master flume]$ echo "Aa11111a111" >> helloworld
-rw-r--r-- hyxy supergroup 157 B 8/6/2018, 7:23:48 PM 3 128 MB helloworld.1565090602533.hyxy
【157 B 】( event 事件: headers +events+内容)
五、通过将要摄取的文件放入磁盘上的“假脱机”目录来摄取数据
与Exec源不同,即使Flume重新启动或被杀死,此源也是可靠的并且不会遗漏数据。
sploodir-loge