kafka运行了一段时间,有一天突然发现kafka服务停了。看日志more logs/server.log.2020-04-20-11如下:
./server.log.2020-04-20-11:java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-49/00000000000000000000.log.swap
./server.log.2020-04-20-11: Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-49/00000000000000000000.log.swap -> /tmp/kafka-logs/__consumer_offsets-49/00000000000000000000.log
[kduser@master logs]$ more server.log.2020-04-20-11
[2020-04-20 11:06:05,736] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2020-04-20 11:16:05,736] INFO [GroupMetadataManager brokerId=0] Group testGroup transitioned to Dead in generation 0 (kafka.coordinator.group.GroupMetadataManager)
[2020-04-20 11:16:05,744] INFO [ProducerStateManager partition=__consumer_offsets-49] Writing producer snapshot at offset 19 (kafka.log.ProducerStateManager)
[2020-04-20 11:16:05,745] INFO [Log partition=__consumer_offsets-49, dir=/tmp/kafka-logs] Rolled new log segment at offset 19 in 8 ms. (kafka.log.Log)
[2020-04-20 11:16:05,746] INFO [GroupMetadataManager brokerId=0] Removed 1 expired offsets in 10 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2020-04-20 11:16:14,425] ERROR Failed to clean up log for __consumer_offsets-49 in dir /tmp/kafka-logs due to IOException (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-49/00000000000000000000.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:211)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1753)
at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1816)
at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1811)
at scala.collection.immutable.List.foreach(List.scala:389)
at kafka.log.Log.replaceSegments(Log.scala:1811)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:533)
at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:465)
at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:464)
at scala.collection.immutable.List.foreach(List.scala:389)
at kafka.log.Cleaner.doClean(LogCleaner.scala:464)
at kafka.log.Cleaner.clean(LogCleaner.scala:442)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:303)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:289)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-49/00000000000000000000.log -> /tmp/kafka-logs/__consumer_offsets-49/00000000000000000000.log
.deleted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783)
... 16 more
错误显示没找到文件导致报错。猜想是不是因为/tmp目录下的文件被系统自动清理删除导致的。这还不确定。
网上看到有类似问题,解决方案是删除/tmp/kafka-logs下的所有文件,重新启动kafka服务。 再重启之前我修改了kafka数据存放目录的配置,然后重启。重启完之后发现新的目录下出现了之前删除的/tmp/kafka-logs下的文件。且kafka正常运行了。貌似是莫名其妙的解决了。
疑惑
1、为什么会出现文件丢失的情况?
有可能是/tmp目录会被系统清除的原因,不确定
2、为什么本地文件丢失会报错?
3、为什么重启后会还原删除的文件?
应该是从zookeeper拉取的
4、为什么重启后就能正常运行?
请了解的大佬评论区留言解惑,感谢!