KAFKA-FLUME的小练习
使用kafka读取flume中的数据
1. 启动集群
2. 将编写好的 创建主题 topic shell脚本(create_topic.sh)上传到hadoop-kafka1:/usr/local/kafka_2.11-2.4.1(路径是KAFKA_HOME路径)
create_topic.sh文件:
#!/bin/bash
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 2 --topic users
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 4 --topic user_friends_raw
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 2 --topic user_friends
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 2 --topic events
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 6 --topic event_attendees_raw
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 3 --topic event_attendees
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 2 --topic train
bin/kafka-topics.sh --create --bootstrap-server hadoop-kafka1:9092 --replication-factor 2 --partitions 2 --topic test
3. 进入hadoop-kafka1、hadoop-kafka2容器,分别都启动kafka
bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &
运行脚本文件
查看topic是否创建成功
ok
4. 然后编写脚本users.sh脚本
#!/bin/bash
# ********* IMPORT NOTES:
# Please create the following directories in sandbox-hdp
mkdir -p /var/flume/checkpoint/users
mkdir -p /var/flume/data/users
# change the permissions
chmod 777 -R /var/flume
cat >> $FLUME_HOME/conf/users.conf << EOF
# **********************************************************************************
# Deploy the following content into Flume
# -------------------------------------------------
# Initialize agent's source, channel and sink
users.sources = usersSource
users.channels = usersChannel
users.sinks = usersSink
# Use a channel which buffers events in a directory
users.channels.usersChannel.type = file
users.channels.usersChannel.checkpointDir = /var/flume/checkpoint/users
users.channels.usersChannel.dataDirs = /var/flume/data/users
# Setting the source to spool directory where the file exists
users.sources.usersSource.type = spooldir
users.sources.usersSource.deserializer = LINE
users.sources.usersSource.deserializer.maxLineLength = 6400
users.sources.usersSource.spoolDir = /events/input/intra/users
users.sources.usersSource.includePattern = users_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
users.sources.usersSource.interceptors = head_filter
users.sources.usersSource.interceptors.head_filter.type = regex_filter
users.sources.usersSource.interceptors.head_filter.regex = ^user_id,locale,birthyear,gender,joinedAt,location,timezone$
users.sources.usersSource.interceptors.head_filter.excludeEvents = true
users.sources.usersSource.channels = usersChannel
# Define / Configure sink
users.sinks.usersSink.type = org.apache.flume.sink.kafka.KafkaSink
users.sinks.usersSink.kafka.bootstrap.servers = hadoop-kafka1:9092,hadoop-kafka2:9092
users.sinks.usersSink.kafka.topic = users
users.sinks.usersSink.channel = usersChannel
EOF
$FLUME_HOME/bin/flume-ng agent -n users -c $FLUME_HOME/conf -f $FLUME_HOME/conf/users.conf -Dflume.root.logger=INFO,console
5. 将user.sh文件上传到hadoop-flume:/root/目录下
在hadoop-flume容器创建路径:/events/input/intra/users(因为上面users.sh输出日志路径为这个),并且将/events下的权限修改为所有权限:
执行users.sh脚本
6. 最后,将本地的users.csv文件上传到hadoop-flume:/events/input/intra/users/users_2020-06-05.csv
接下来在hadoop-flume容器就能看到运行自动生成的日志文件了
在hadoop-kafka容器可以查看偏移量
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hadoop-kafka1:9092 --topic users --time -1
over~