kafka日志被异常清理导致进程频繁挂掉

项目中使用到了kafka作为消息中间件,但是部署到服务器上之后kafka进程不定期挂掉,zookeeper却无任何影响。

1.错误日志

最近一次项目中kafka挂掉,查看日志文件kafkaServer.out看到了如下的错误:

[2020-06-21 17:35:04,919] ERROR Failed to clean up log for __consumer_offsets-30 in dir /tmp/kafka-logs due to IOException (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-30/00000000000000000074.index
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
	at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
	at java.nio.file.Files.move(Files.java:1395)
	at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:815)
	at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
	at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:509)
	at kafka.log.Log.asyncDeleteSegment(Log.scala:1962)
	at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:2025)
	at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:2020)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at kafka.log.Log.replaceSegments(Log.scala:2020)
	at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:602)
	at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:528)
	at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:527)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at kafka.log.Cleaner.doClean(LogCleaner.scala:527)
	at kafka.log.Cleaner.clean(LogCleaner.scala:501)
	at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:359)
	at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:328)
	at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:307)
	at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:89)
	Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-30/00000000000000000074.index -> /tmp/kafka-logs/__consumer_offsets-30/00000000000000000074.index.deleted
		at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
		at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
		at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
		at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
		at java.nio.file.Files.move(Files.java:1395)
		at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:812)
		... 17 more
[2020-06-21 17:35:04,927] INFO [ReplicaManager broker=0] Stopping serving replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager)
[2020-06-21 17:35:04,932] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Set(__consumer_offsets-22, __consumer_offsets-30, __consumer_offsets-8, __consumer_offsets-21, __consumer_offsets-4, __consumer_offsets-27, __consumer_offsets-7, WeChat-0, __consumer_offsets-9, __consumer_offsets-46, __consumer_offsets-25, __consumer_offsets-35, __consumer_offsets-41, __consumer_offsets-33, __consumer_offsets-23, __consumer_offsets-49, __consumer_offsets-47, __consumer_offsets-16, __consumer_offsets-28, DingTalk-0, Storage-0, Resend-0, __consumer_offsets-31, __consumer_offsets-36, __consumer_offsets-42, __consumer_offsets-3, __consumer_offsets-18, __consumer_offsets-37, __consumer_offsets-15, __consumer_offsets-24, __consumer_offsets-38, __consumer_offsets-17, __consumer_offsets-48, Email-0, __consumer_offsets-19, Portal-0, __consumer_offsets-11, __consumer_offsets-13, __consumer_offsets-2, __consumer_offsets-43, __consumer_offsets-6, __consumer_offsets-14, SMS-0, __consumer_offsets-20, __consumer_offsets-0, __consumer_offsets-44, __consumer_offsets-39, __consumer_offsets-12, __consumer_offsets-45, __consumer_offsets-1, __consumer_offsets-5, __consumer_offsets-26, __consumer_offsets-29, __consumer_offsets-34, __consumer_offsets-10, __consumer_offsets-32, __consumer_offsets-40) (kafka.server.ReplicaFetcherManager)
[2020-06-21 17:35:04,933] INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher for partitions Set(__consumer_offsets-22, __consumer_offsets-30, __consumer_offsets-8, __consumer_offsets-21, __consumer_offsets-4, __consumer_offsets-27, __consumer_offsets-7, WeChat-0, __consumer_offsets-9, __consumer_offsets-46, __consumer_offsets-25, __consumer_offsets-35, __consumer_offsets-41, __consumer_offsets-33, __consumer_offsets-23, __consumer_offsets-49, __consumer_offsets-47, __consumer_offsets-16, __consumer_offsets-28, DingTalk-0, Storage-0, Resend-0, __consumer_offsets-31, __consumer_offsets-36, __consumer_offsets-42, __consumer_offsets-3, __consumer_offsets-18, __consumer_offsets-37, __consumer_offsets-15, __consumer_offsets-24, __consumer_offsets-38, __consumer_offsets-17, __consumer_offsets-48, Email-0, __consumer_offsets-19, Portal-0, __consumer_offsets-11, __consumer_offsets-13, __consumer_offsets-2, __consumer_offsets-43, __consumer_offsets-6, __consumer_offsets-14, SMS-0, __consumer_offsets-20, __consumer_offsets-0, __consumer_offsets-44, __consumer_offsets-39, __consumer_offsets-12, __consumer_offsets-45, __consumer_offsets-1, __consumer_offsets-5, __consumer_offsets-26, __consumer_offsets-29, __consumer_offsets-34, __consumer_offsets-10, __consumer_offsets-32, __consumer_offsets-40) (kafka.server.ReplicaAlterLogDirsManager)
[2020-06-21 17:35:04,995] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-22,__consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,WeChat-0,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,DingTalk-0,Storage-0,Resend-0,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,Email-0,__consumer_offsets-19,Portal-0,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,SMS-0,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40 and stopped moving logs for partitions  because they are in the failed log directory /tmp/kafka-logs. (kafka.server.ReplicaManager)
[2020-06-21 17:35:04,996] INFO Stopping serving logs in dir /tmp/kafka-logs (kafka.log.LogManager)
[2020-06-21 17:35:05,002] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager)

看上面的报错信息,很明显是日志文件缺少导致节点被挂掉,为什么会出现这样的问题呢?

2.原因排查

linux会定时清理/tmp目录下的文件,我的kafkari日志文件目录正是放在了/tmp/kafka-logs目录下,导致被定时给清理掉了,所以kafka在尝试读取或追加日志时就会出错。

3.解决方法

1)方法一:更改kafka日志存放目录
进入kafka根目录,编辑config目录下的server.properties文件,设置属性

log.dirs=/opt/kafka_2.12-2.3.0/kafka-logs/

2)方法二:添加kafka日志目录到清理白名单中
① centos7下/tmp目录的清理由服务systemd负责,其相关配置文件在/usr/lib/tmpfiles.d目录下,我们修改配置文件tmp.conf,将kafka日志目录加进去,

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
v /tmp 1777 root root 10d
v /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp
#防止删除kafka日志文件
X /tmp/kafka-logs

② centos6下/tmp目录的清理是通过tmpwatch来实现的,tmpwatch则依赖于cron的定时调度,调度文件为/etc/cron.daily/tmpwatch
同样的方式添加配置

#防止删除kafka日志文件
X /tmp/kafka-logs
  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值