源数据文件:https://pan.baidu.com/s/1UiM8qmYY8MFKJaSLwIlPqQ
提取码:apk6
1.在flume的conf目录下创建jobkb09目录:mkdir /opt/flume160/conf/jobkb09
2.进入jobkb09目录,在其中创建tmp目录,并将源数据文件均放入其中
3.创建Kafka topic:
events :
kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic events --partitions 1 --replication-factor 1
event_attendees:
kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic event_attendees --partitions 1 --replication-factor 1
train:
kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic train --partitions 1 --replication-factor 1
user_friends:
kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic user_friends --partitions 1 --replication-factor 1
users:
kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic users --partitions 1 --replication-factor 1
4.编写对应需求的agent配置文件:
train-flume-kafka.conf:
train.sources=trainSource
train.channels=trainChannel
train.sinks=trainSink
train.sources.trainSource.type=spooldir
train.sources.trainSource.spoolDir=/opt/flume160/conf/jobkb09/dataSourceFile/train
train.sources.trainSource.deserializer=LINE
train.sources.trainSource.deserializer.maxLineLength=320000
train.sources.trainSource.includePattern=train_[0-9]{
4}-[0-9]{
2}-[0-9]{
2}.csv
train.sources.trainSource.interceptors=head_filter
train.sources.trainSource.interceptors.head_filter.type=regex_filter
train.sources.trainSource.interceptors.head_filter.regex=^user*
train.sources.trainSource.interceptors.head_filter.excludeEvents=true
train.channels.trainChannel.type=