Flume 四:监测本地目录上传至HDFS
本文用到的文件会上传以供下载练习
https://download.csdn.net/download/cai_and_luo/12441018
https://download.csdn.net/download/cai_and_luo/12440889
将以下架包放入 /Flume/lib 目录下(上传至HDFS需要的架包)
一:创建目录
在/opt下面创建 Flume 要监测的目录 /opt/flumelog/user_friends
[root@cai flumelog]# mkdir events
创建监查点的目录 /opt/flumelog/checkpoint/user_friends
[root@cai flumelog]# mkdir checkpoint
[root@cai checkpoint]# mkdir user_friends
创建结果输出目录 /opt/flumelog/data/user_friends
[root@cai flumelog]# mkdir data
[root@cai flumelog]# cd data/
[root@cai data]# mkdir user_friends
[root@cai data]# pwd
/opt/flumelog/data
二:创建配置文件
在目录 /opt/bigdata/flume/conf/job 下创建各种配置文件
创建配置文件 /user_friends-flume-hdfs.conf
[root@cai job]# touch ./user_friends-flume-hdfs.conf
[root@cai job]# ls
events-flume-logger.conf netcat-flume-logger.conf user_friends-flume-hdfs.conf
进行配置
vi ./user_friends-flume-hdfs.conf
user_friends.sources = userFriendsSource
user_friends.channels = userFriendsChannel
user_friends.sinks = userFriendsSink
user_friends.sources.userFriendsSource.type = spooldir
user_friends.sources.userFriendsSource.spoolDir = /opt/flumelog/user_friends
user_friends.sources.userFriendsSource.includePattern = userFriends_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
user_friends.sources.userFriendsSource.deserializer = LINE
user_friends.sources.userFriendsSource.deserializer.maxLineLength = 128000
user_friends.channels.userFriendsChannel.type = file
user_friends.channels.userFriendsChannel.checkpointDir = /opt/flumelog/checkpoint/user_friends
user_friends.channels.userFriendsChannel.dataDirs = /opt/flumelog/data/user_friends
user_friends.sinks.userFriendsSink.type = hdfs
user_friends.sinks.userFriendsSink.hdfs.fileType = DataStream
user_friends.sinks.userFriendsSink.hdfs.filePrefix = userfriend
user_friends.sinks.userFriendsSink.hdfs.fileSuffix = .csv
user_friends.sinks.userFriendsSink.hdfs.path = hdfs://192.168.101.130:9000/user/userfriend/%Y-%m-%d
user_friends.sinks.userFriendsSink.hdfs.useLocalTimeStamp = true
user_friends.sinks.userFriendsSink.hdfs.batchSize = 640
user_friends.sinks.userFriendsSink.hdfs.rollCount = 0
user_friends.sinks.userFriendsSink.hdfs.rollSize = 100000000
user_friends.sinks.userFriendsSink.hdfs.rollInterval = 30
user_friends.sinks.userFriendsSink.channel = userFriendsChannel
user_friends.sources.userFriendsSource.channels = userFriendsChannel
三:启动
./bin/flume-ng agent -c conf/ -f conf/job/events-flume-logger.conf -n events -Dflume.root.logger=INFO,console
离开此shell 页面,去往新 shell 拷贝要监测的文件到待监测目录
四:把要用的文件放入 Linux 本地(用xftp进行拖拽)
在此 /opt/bigdata/flume/conf/events 路径中创建目录,将要用的文件拉进来
[root@cai conf]# mkdir events
[root@cai conf]# cd events/
[root@cai events]# pwd
/opt/bigdata/flume/conf/events
将要进行监测的文件 拷贝 到待监测目录
[root@cai events]# pwd
/opt/bigdata/flume/conf/events
[root@cai events]# cp user_friends.csv /opt/flumelog/user_friends/userfriends_2020-05-20.csv
注意:以上 userfriends_2020-05-20.csv的文件格式是固定的,由以上 /events-flume-logger.conf 的配置文件决定(见下图)
拷贝以后,去往刚刚运行启动代码的 shell 页面,看到正在读数据即成功
去 web 端查看是否上传成功到 HDFS
http://192.168.101.130:50070/
注:可能无法下载,那是因为window环境不认识虚拟机,需要去 C:\Windows\System32\drivers\etc/hosts添加本虚拟机的id 和 hostname(如本机:192.168.101.130 cai)
看见上传的文件即表示测试成功!!!