文章目录
监控 /data_log 目录下日志文件中的数据,将数据存储到 Kafka 的 Topic 中 。
需求: 三个 Topic 分别为:ChangeRecord
,ProduceRecord
,EnvrionmentData
;分区数为:
4
4
4。
需求分析: 需要将三个种 log 日志文件,区分开来存储,所以我们需要三个不同的 Source
和 Sink
。
1. 创建 Kafka Topic
bin/kafka-topic.sh --create --broker-list master:9092 --topic ChangeRecord,ProduceRecord,EnvironmentData --partitions 4 --replication-factor 2
2. Flume 配置文件 :
produceRecord:
# 创建 Sources、Sinks、Channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1
# producerecord 内容配置
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /data_log
# 判断目录下的文件名是否存在正则所表达的内容
a1.sources.s1.includePattern = ^producerecord.*$
a1.sources.s1.fileHeader = true
# 配置过滤器 去掉表头
a1.sources.s1.interceptors = i1
a1.sources.s1.interceptors.i1.type = regex_filter
a1.sources.s1.interceptors.i1.regex = \s*Produce.*
a1,sources.s1.interceptors.i1.excludeEvents = true
# 获取 将数据存储到 Kafka 中
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = ProduceRecord
a1.sinks.k1.kafka.bootstrap.servers = master:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy
a1.channels.type = memory
a1.channels.capacity = 10000
a1.channels.transcationCapacity = 1000
a1.sinks.channel = c1
a1.sources.channels = c1
changeRecode:
# 创建 Sources、Sinks、Channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1
# producerecord 内容配置
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /data_log
# 判断目录下的文件名是否存在正则所表达的内容
a1.sources.s1.includePattern = ^chagerecord.*$
a1.sources.s1.fileHeader = true
# 配置过滤器 去掉表头
a1.sources.s1.interceptors = i1
a1.sources.s1.interceptors.i1.type = regex_filter
a1.sources.s1.interceptors.i1.regex = \s*Change.*
a1,sources.s1.interceptors.i1.excludeEvents = true
# 获取 将数据存储到 Kafka 中
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = ChangeRecord
a1.sinks.k1.kafka.bootstrap.servers = master:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy
a1.channels.type = memory
a1.channels.capacity = 10000
a1.channels.transcationCapacity = 1000
EnvironmentData:
# 创建 Sources、Sinks、Channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1
# producerecord 内容配置
a1.sources.s1.type = spooldir
a1,sources.s1.spoolDir = /data_log
a1.sources.s1.includePattern = ^envrionment.*$
a1.sources.s1.fileHeader = true
# 配置过滤器 去掉表头
a1.sources.s1.interceptors = i1
a1.sources.s1.interceptors.i1.type = regex_filter
a1.sources.s1.interceptors.i1.regex = \s*PM25.*
a1.sources.s1.interceptors.i1.excludeEvents = true
# 获取 将数据存储到 Kafka 中
a1.sinks.k1.type = org.apache.flume.sink.kafka.SinkKafka
a1.sinks.k1.topic = EnvironmentData
a1.sinks.k1.kafka.bootstrap.servers = master:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy
a1.channels.type = memory
a1.channels.capacity = 10000
a1.channels.transcationCapacity = 1000
a1.sources.channels = c1
a1.sinks.channel = c1
3. 启动 Flume 配置文件:
bin/flume-ng agent -n a1 -c conf -f file
将数据备份到 HDFS 中
需求: 将监控到的数据,备份到 HDFS 中的 /user/test/flumebackup
中。
a1.sources = s1
a1.sinks = k1
a1.channels = c1
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /data_log
a1.sources.s1.fileHeader = true
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://master:8020/user/test/flumebackup/
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.filePrefix = %{filename}.bak
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.fileSuffx = .log
a1.sinks.k1.hdfs.roundUnit = hour
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.batchSize = 100
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 20
a1.sinks.k1.hdfs.rollSize = 13421700
a1.sinks.k1.hdfs.rollCount = 0
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transcationCapacity = 1000
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1