Flume将日志log文件从本地导入Kafka_channel,再存储到HDFS。

作为前一篇的修改版,取出来kafka-in.conf文件中的sink和kafka-out.conf文件中的source。

前一篇链接:https://blog.csdn.net/m0_37890482/article/details/81130840

以下配置文件都存储于 /etc/flume-ng/conf/ 下面

kafka-in.conf

#--------文件从本地路径到kafka-in配置文件------#
#--------------Edit by cheengvho-------------#
 
# 指定Agent的组件名称
agent1.sources = file_source
agent1.channels = kafka_channel
 
#-------file_source(要监听的路径)---------
agent1.sources.file_source.interceptors = i1
agent1.sources.file_source.interceptors.i1.type = regex_extractor
agent1.sources.file_source.interceptors.i1.regex = ^(?:\\n)?(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d)
agent1.sources.file_source.interceptors.i1.serializers = s1
agent1.sources.file_source.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
agent1.sources.file_source.interceptors.i1.serializers.s1.name = timestamp
agent1.sources.file_source.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm
 
agent1.sources.file_source.type = spooldir
agent1.sources.file_source.spoolDir = /flume/kafka_logs
#agent1.sources.file_source.deletePolicy = immediate

#-------------kafka_channel-------------------
agent1.channels.kafka_channel.type = org.apache.flume.channel.kafka.KafkaChannel
agent1.channels.kafka_channel.brokerList = localhost:9092
agent1.channels.kafka_channel.kafka.bootstrap.servers = localhost:9092
agent1.channels.kafka_channel.zookeeperConnect = localhost:2181
agent1.channels.kafka_channel.kafka.topic = access
agent1.channels.kafka_channel.kafka.consumer.group.id = flume-consumer
agent1.channels.kafka_channel.capacity = 10000
agent1.channels.kafka_channel.transactionCapacity = 1000
 
#------------------------------------------
agent1.sources.file_source.channels = kafka_channel
agent1.sinks.kafka_sink.channel = kafka_channel

kafka-out.conf

#------------------kafka-out.conf-------------------#
#----------------Edit by cheengvho------------------#
# ------------------ 定义数据流----------------------#

agent2.channels = kafka_channel
agent2.sinks = hdfs_sink
 
#---------hdfs_sink 相关配置------------------
agent2.sinks.hdfs_sink.type = hdfs
agent2.sinks.hdfs_sink.channel = kafka_channel
agent2.sinks.hdfs_sink.filePrefix = %Y-%m-%d
agent2.sinks.hdfs_sink.filesuffix = .log
agent2.sinks.hdfs_sink.hdfs.path = /loudacre/kafka/%Y%m%d
agent2.sinks.hdfs_sink.hdfs.rollSize = 524288
agent2.sinks.hdfs_sink.hdfs.rollCount = 0
agent2.sinks.hdfs_sink.hdfs.rollInterval = 0
agent2.sinks.hdfs_sink.hdfs.threadsPoolSize = 30
agent2.sinks.hdfs_sink.hdfs.fileType=DataStream
agent2.sinks.hdfs_sink.hdfs.writeFormat=Text

#------- kafka_channel相关配置-------------------------
agent2.channels.kafka_channel.zookeeperConnect = localhost:2181
agent2.channels.kafka_channel.type = org.apache.flume.channel.kafka.KafkaChannel
agent2.channels.kafka_channel.brokerList = localhost:9092
agent2.channels.kafka_channel.kafka.bootstrap.servers = localhost:9092
agent2.channels.kafka_channel.kafka.topic = access
agent2.channels.kafka_channel.kafka.consumer.group.id = flume-consumer
agent2.channels.kafka_channel.capacity = 100000
agent2.channels.kafka_channel.transactionCapacity = 10000

之后就同这一篇里面后面部分讲述的一样的方法按顺序启动 Zookeeper、Kafka、Flume agent(kafka-in.conf)、Flume agent(kafka-out.conf)。

链接:https://blog.csdn.net/m0_37890482/article/details/81126522

展开阅读全文

没有更多推荐了,返回首页