主要实现的是数据源到flume然后sink到对应topic里,随后项目会写到hbase
首先,数据源的地址要创建然后把文件放入
目前数据源这一块是完成了
然后我们配置flume上面的东西
events.sources = eventsSource
events.channels = eventsChannel
events.sinks = eventsSink
# Use a channel which buffers events in a directory
events.channels.eventsChannel.type = file
events.channels.eventsChannel.checkpointDir = /var/flume/checkpoint/events
events.channels.eventsChannel.dataDirs = /var/flume/data/events
# Setting the source to spool directory where the file exists
events.sources.eventsSource.type = spooldir
events.sources.eventsSource.deserializer = LINE
events.sources.eventsSource.deserializer.maxLineLength = 6400
events.sources.eventsSource.spoolDir = /events/input/intra/events
events.sources.eventsSource.includePattern = events_[0-9]{4]-[0-9]{2]-[0-9]{2].csv
events.sources.eventsSource.channels = eventsChannel
# Define / Configure sink
events.sinks.eventsSink.type = org.apache.flume.sink.kafka.KafkaSink
events.sinks.eventsSink.batchSize = 640
events.sinks.eventsSink.brokerList = sandbox-hdp.hortonworks.com:6667
events.sinks.eventsSink.topic = events
events.sinks.eventsSink.channel = eventsChannel
然后
写到这里,之后会提示你保存退出
(这是用events举例,剩下的同理)
,这样第二步flume已经处理完,剩下最后一步就是创建topic
kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic users --partitions 3 -- replication-factor 1
275 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic users --partitions 3 --replication-factor 1
276 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --list
277 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic users --partitions 3 --replication-factor 1
278 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --list
279 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic user_friends_raw --partotions 3 --replication-factor 1
280 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic user_friends_raw --partitions 3 --replication-factor 1
281 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --list
282 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create events --patitions 3 --replication-factor 1
283 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create topic events --patitions 3 --replication-factor 1
284 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic events --patitions 3 --replication-factor 1
285 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic events --partitions 3 --replication-factor 1
286 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic event_attendees_raw --partitions 3 --replication-factor 1
287 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic test --partitions 3 --replication-factor 1
288 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic train --partitions 3 --replication-factor 1
289 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --list
290 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic user_friends --partitions 3 --replication-factor 1
291 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic event_attendees --partitions 3 --replication-factor 1
292 kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --list
293 mkdir -p /var/flume/checkpoint/users
294 chmod 777/var/flume/checkpoint/users
295 chmod 777 /var/flume/checkpoint/users
296 mkdir -p /var/flume/checkpoint/user_friends_raw
297 mkdir -p /var/flume/checkpoint/user_friends
298 mkdir -p /var/flume/checkpoint/train
299 mkdir -p /var/flume/checkpoint/test
300 mkdir -p /var/flume/checkpoint/events
301 mkdir -p /var/flume/checkpoint/event_attendees_raw
302 mkdir -p /var/flume/checkpoint/event_attendees
303 chmod -R 777 /var/flume/checkpoint/
304 ll/var/flume/checkpoint/
305 ll /var/flume/checkpoint/
306 mkdir -p /var/flume/data/users
307 mkdir -p /var/flume/data/user_friends_raw
308 mkdir -p /var/flume/data/user_friends
309 mkdir -p /var/flume/data/train
310 mkdir -p /var/flume/data/test
311 mkdir -p /var/flume/data/events
312 mkdir -p /var/flume/data/event_attendees_raw
313 mkdir -p /var/flumedata/event_attendees
314 chmod -R 777 /var/data/checkpoint/
315 chmod -R 777 /var/data/
316 chmod -R 777 /var/flume/data/
317 ll /var/flume/data/
318 mkdir /events/input/intra/users/
319 cd /BDSP2/
320 ll
这是创建topic并且创建了刚刚在flume里面要source进去的目录,更改权限并且cd到源文件目录里,历史代码有一些有错误跳过即可
中间重现过一个问题,就是topic创建错误,我需要删掉那个topic,如果直接delete,并不是彻底删除,而是给topic加一个标记
删除总共两步,首先去kafka的broker
kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --delete --topic event_attendees_row
然后我们开始进行多米诺的骨牌
我们需要把源文件输入到flume的source中,形成流数据,然后给一个监听去看
321 install -m 777 users.csv /events/input/intra/users/users_2018_10_18.csv
322 install -m 777 users.csv /events/input/intra/user_friends_raw/user_friends_raw_2018_10_18.csv
323 install -m 777 user_friends.csv /events/input/intra/user_friends_raw/user_friends_raw_2018_10_18.csv
324 install -m 777 event_attendees.csv /events/input/intra/event_attendees/event_attendees_2018_10_18.csv
325 install -m 777 test.csv /events/input/intra/test/test_2018_10_18.csv
326 install -m 777 train.csv /events/input/intra/train/train_2018_10_18.csv
327* install -m 777 events.csv /events/input/intra/events/events_2018_10_18.csv
install -m的命令我的理解是,把文件移动到目标位置并且更名 -m给予权限,作死给个777
然后给个监听
kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --topic events --from-beginning
你会看到数据
就是这种感觉