基于两级Flume+Kafka的日志采集架构

本文给出了一种基于Flume+Kafka的通用日志采集传输架构,记录了其中的组件部署、配置、故障解决的过程,希望给后来的同学提供一些参考,不足之处欢迎指正

我们的需求

希望把各个客户端的日志收集起来,走公网汇聚到核心节点,然后优核心节点输出到数据处理平台,需要collector到处理平台的输出方式能够支持各种常规的处理,比如读HDFS数据进行批处理、sparkstreaming/flink实时流处理,入ES进行实时查询等,且要求传输链路稳定,性能较高,如下
在这里插入图片描述

我们的架构设计

整个采集传输链路架构如下
在这里插入图片描述
边缘收集
在边缘agent端,采用Avro source接收Syslog日志,由于agent服务器和Flume服务器在同一机房且边缘日志量不会像汇聚节点的日志量那么大,因此source端选择了UDP协议传输,不放心的可以配TCP协议传输但是吞吐量会降低,channel目前选择的是filechannel,基于两点考虑:1. 边缘日志量不会特别大,filechannel配上SSD性能可以接受;2. Memorychannel存在丢数据风险,边缘搭Kafkachannel不现实成本太高,SpillableMemoryChannel官网还不建议用,因此比较理想的选择只有Filechannel了。Sink选择的是Avro sink到核心节点的二级Flume,由于边缘到核心节点走的是公网传输,为了保证传输稳定性,会给每个边缘Flume配置多个sink到二级Flume的不同机器,根据网络状况给sink配置不同的优先级以及设置故障转移模式,当然如果核心为三线机房的情况下不需考虑不同核心服务的网络差异,在核心前面配HAProxy来保证高可用也是不错的选择。
核心汇聚
在核心汇聚层,我们搭建了二级Flume来接收边缘Flume的日子,source还是为Avro source,但是核心汇聚节点的日志量一般要远高于边缘节点,如果依然采用Filechannel性能会难以满足,Memorychannel还是存在进程挂掉丢数据的风险,因此这里选择了Kafkachannel,既利用到了Kafka集群本身多副本机制保证了日志完整性,又充分利用到了Kafka高吞吐的特点,不了解Kafka读写性能高的原因的可以好好去研究下Kafka的磁盘顺序读写以及Zero Copy技术,此外,将数据写进Kafkachannel也可以方便后续Sparkstreaming/Flink流式读取处理数据,同样可以从Kafka写到ES供实时检索,做到了Topic一次写入多次读取复用,sink的话选择的是HDFSsink。

部署&问题解决

边缘节点一级Flume配置文件如下

# Source
L1.sources.r1.type = avro
L1.sources.r1.port = 5150
L1.sources.r1.bind = 0.0.0.0
L1.sources.r1.channels = c1

# define interceptor
# L1.sources.r1.interceptors = i1
# L1.sources.r1.interceptors.i1.type = flume.LogAnalysis$Builder

# Channel
L1.channels.c1.type = file
L1.channels.c1.checkpointDir = /home/flumechk
L1.channels.c1.dataDirs = /home/flumedata

# SinkGroup
L1.sinkgroups=g1
L1.sinkgroups.g1.sinks=k1 k2
L1.sinkgroups.g1.processor.type=failover
L1.sinkgroups.g1.processor.priority.k1=10
L1.sinkgroups.g1.processor.priority.k2=20
L1.sinkgroups.g1.processor.maxPenality=30000

# Sink1
L1.sinks.k1.channel=c1
L1.sinks.k1.type=avro
L1.sinks.k1.hostname=192.168.1.157
L1.sinks.k1.port=5150

# Sink2
L1.sinks.k2.channel=c1
L1.sinks.k2.type=avro
L1.sinks.k2.hostname=192.168.1.158
L1.sinks.k2.port=5150

核心节点二级Flume配置文件

L2.sources = r1
L2.channels = c1
L2.sinks = k1

L2.sources.r1.type = avro
L2.sources.r1.port = 5150
L2.sources.r1.bind = 0.0.0.0
L2.sources.r1.channels = c1

L2.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
L2.channels.c1.kafka.bootstrap.servers = 192.168.1.120:6667,192.168.1.121:6667,192.168.1.122:6667
L2.channels.c1.kafka.topic = test-newzj
L2.channels.c1.kafka.consumer.group.id = test-hdfs
L2.channels.c1.keep-alive = 30

L2.sinks.k1.type = hdfs
L2.sinks.k1.hdfs.path = hdfs://192.168.1.121:8020/data/test-newzj
L2.sinks.k1.hdfs.filePrefix = testlog
L2.sinks.k1.hdfs.fileType = DataStream
L2.sinks.k1.hdfs.useLocalTimeStamp = true
L2.sinks.k1.hdfs.writeFormat = Text
L2.sinks.k1.hdfs.rollCount = 0
L2.sinks.k1.hdfs.rollSize = 0
L2.sinks.k1.hdfs.rollInterval = 3600
L2.sinks.k1.channel = c1

启动

bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name L2 -Dflume.root.logger=INFO,console

开始Flume版本选择的是1.6.0,启动报错Unable to deliver event. Exception follows ,java.lang.IllegalStateException: value is absent,经查找异常原因为Flume-1.6.0版本的一个Bug,在Kafkachannel和HdfsSink组合时会报此异常,Bug单可参见Flume-1.6.0中Kafkachannel到hdfssink Bug单, 链接中给出了解决方案,要么pom文件中加入zookeeper依赖,为了省事,我们直接升级了Flume版本到Flume-1.7.0,问题解决

2019-03-13 20:17:17,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
java.lang.IllegalStateException: value is absent
	at com.google.common.base.Optional$Absent.get(Optional.java:263)
	at org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction.doRollback(KafkaChannel.java:387)
	at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:458)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:748)

在使用Flume-1.7.0的hdfssink时启动会报许多错误,如java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher

2019-03-13 20:27:36,124 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:146)] Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType
	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:235)
	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:411)
	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 12 more
java.lang.NoClassDefFoundError: org/apache/zookeeper/Watcher
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:89)
	at kafka.utils.ZkUtils$.apply(ZkUtils.scala:71)
	at kafka.utils.ZkUtils.apply(ZkUtils.scala)
	at org.apache.flume.channel.kafka.KafkaChannel.migrateOffsets(KafkaChannel.java:308)
	at org.apache.flume.channel.kafka.KafkaChannel.start(KafkaChannel.java:136)
	at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 25 more

基本上都是各种ClassNotFoundException缺少jar包,很奇怪为什么Flume提供了Hdfssink却不能开箱即用还有自己手动添加各种缺少的Hadoop相关的包…我们从官网下载了Hadoop-2.6.0,解压copy相关jar包到Flume的lib目录下,主要有以下几个

commons-configuration-1.6.jar
hadoop-common-2.6.5.jar
hadoop-hdfs-2.6.5.jar
htrace-core-3.0.4.jar
zookeeper-3.4.6.jar

找不到在Hadoop那个路径下的直接到Hadoop目录下find -name搜即可,正常添加完这几个jar包后就没问题了,启动核心Flume然后去HDFS查看有无收到日志文件即可。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值