flume监控采集到的数据到kafka中topic进行消费
1.创建kafka的topic
kafka-topics.sh --create --zookeeper 192.168.116.60:2181 --topic user_friends_raw --partitions 1 --replication-factor 1
2.创建并编辑flume脚本
vi userFriend-flume-kafka.conf
flume脚本代码:
user_friend.sources=userFriendSource
user_friend.channels=userFriendChannel
user_friend.sinks=userFriendSink
user_friend.sources.userFriendSource.type=spooldir
user_friend.sources.userFriendSource.spoolDir=/opt/flume160/conf/jobkb09/dataSourceFile/userFriend
user_friend.sources.userFriendSource.deserializer=LINE
user_friend.sources.userFriendSource.deserializer.maxLineLength=320000
user_friend.sources.userFriendSource.includePattern=userFriend_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
//flume过滤
user_friend.sources.userFriendSource.interceptors=head_filter
user_friend.sources.userFriendSource.interceptors.head_filter.type=regex_filter
user_friend.sources.userFriendSource.interceptors.head_filter.regex=^user,friends*
user_friend.sources.userFriendSource.interceptors.head_filter.excludeEvents=true
user_friend.channels.userFriendChannel.type=file
user_friend.channels.userFriendChannel.checkpointDir=/opt/flume160/conf/jobkb09/checkPointFile/userFriend
user_friend.channels.userFriendChannel.dataDirs=/opt/flume160/conf/jobkb09/dataChannelFile/userFriend
user_friend.sinks.userFriendSink.type=org.apache.flume.sink.kafka.KafkaSink
user_friend.sinks.userFriendSink.batchSize=640
user_friend.sinks.userFriendSink.brokerList=192.168.116.60:9092
user_friend.sinks.userFriendSink.topic=user_friends_raw
user_friend.sources.userFriendSource.channels=userFriendChannel
user_friend.sinks.userFriendSink.channel=userFriendChannel
3.执行flume脚本
[root@hadoop001 flume160]# ./bin/flume-ng agent --name user_friend --conf ./conf/ --conf-file ./conf/jobkb09/userFriend-flume-kafka.conf -Dflume.root.logger=INFO,console
4.复制数据到被flume监控的路径
//复制数据使用install 命令也可以
二者区别:如果目标文件存在,cp会先清空文件后往里写入新文件,而install则会先删除掉原先的文件然后写入新文件。
cp user_friends.csv /opt/flume160/conf/jobkb09/dataSourceFile/userFriend/userFriend_2020-12-08.csv
这里如果报错如下,不要删除监控路径下的文件,然后再次执行一次flume脚本 就可以了
2021-01-12 19:27:20,256 (pool-4-thread-1) [ERROR - org.apache.flume.source.Spoo lDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:280)]
FAT AL: Spool Directory source eventsSource: { spoolDir: /kb09file/events }: Uncaug ht exception in SpoolDirectorySource thread.
Restart or reconfigure Flume to co ntinue processing.
5.查看队列分区信息
kafka-topics.sh --zookeeper 127.0.0.1:2181 --describe --topic user_friends_raw
5.1查看队列信息数
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 192.168.116.60:9092 --topic user_friends_raw -time -1 --offsets 1
6.消费消息
kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic user_friends_raw --from-beginning
7.删除topic
kafka-topics.sh --zookeeper 192.168.116.60:2181 --topic user_friends_raw --delete