Flume在修改文件名后会重复读取文件问题

Flume在修改文件名后会重复读取文件问题

问题描述:
使用正则表示监控文件名时,当修改文件名称之后,会重复读取数据。
问题场景:
在生产环境下,使用log4j打印日志框架时,会变更打印日志名称,造成flume重复读取
问题重现:

  1. 配置信息 test.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /opt/module/data/flume.*
a1.sources.r1.positionFile = /opt/module/flume/taildir/taildir_flume.json

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

  1. 启动任务
bin/flume-ng agent -n a1 -c conf -f conf/test.conf Dflume.root.logger=INFO,console
  1. 测试
    3.1 在/opt/module/data目录下创建flume.开头的文件
    3.2 写入数据
    3.3 修改文件名
touch flume.log
echo aaa >> flume.log 
echo bbb >> flume.log 
mv flume.log flume.log123 
  1. 查看控制台,会发现控制台上当前文件输出了两次。
2020-06-24 15:37:34,682 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 61 61                                        aaa }
2020-06-24 15:38:06,712 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 62 62 62                                        bbb }
2020-06-24 15:38:36,719 (PollableSourceRunner-TaildirSource-r1) [INFO - org.apache.flume.source.taildir.ReliableTaildirEventReader.openFile(ReliableTaildirEventReader.java:290)] Opening file: /opt/module/data/flume.log123, inode: 405574, pos: 0
2020-06-24 15:38:36,719 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 61 61 61                                        aaa }
2020-06-24 15:38:36,720 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 62 62 62                                        bbb }

解决方案

Flume在判断文件是否为新文件的时候会记录两个值:
indode:linux文件的唯一id
file:文件路径

解决方案一:协商处理

不变更文件名

跟公司后台人员协商;
让他们使用类似logback不更名打印日志框架,不要使用log4j会更名的打印日志框架。

解决方案二:修改源码

下载flume的源码包,打开flume-taildir-source项目文件。
修改地点一:TailFile.java

   public boolean updatePos(String path, long inode, long pos) throws IOException {
        //if (this.inode == inode && this.path.equals(path)) {
        if (this.inode == inode) {
            setPos(pos);
            updateFilePos(pos);
            logger.info("Updated position, file: " + path + ", inode: " + inode + ", pos: " + pos);
            return true;
        }
        return false;
    }

将if判断条件的&&逻辑语句后半段删除。
修改地点二:ReliableTaildirEventReader.java

public List<Long> updateTailFiles(boolean skipToEnd) throws IOException {
        updateTime = System.currentTimeMillis();
        List<Long> updatedInodes = Lists.newArrayList();

        for (TaildirMatcher taildir : taildirCache) {
            Map<String, String> headers = headerTable.row(taildir.getFileGroup());

            for (File f : taildir.getMatchingFiles()) {
                long inode;
                try {
                    inode = getInode(f);
                } catch (NoSuchFileException e) {
                    logger.info("File has been deleted in the meantime: " + e.getMessage());
                    continue;
                }
                TailFile tf = tailFiles.get(inode);
                //if (tf == null || !tf.getPath().equals(f.getAbsolutePath())) {
                if (tf == null) {
                    long startPos = skipToEnd ? f.length() : 0;
                    tf = openFile(f, headers, inode, startPos);
                } else {
                    boolean updated = tf.getLastUpdated() < f.lastModified() || tf.getPos() != f.length();
                    if (updated) {
                        if (tf.getRaf() == null) {
                            tf = openFile(f, headers, inode, tf.getPos());
                        }
                        if (f.length() < tf.getPos()) {
                            logger.info("Pos " + tf.getPos() + " is larger than file size! "
                                    + "Restarting from pos 0, file: " + tf.getPath() + ", inode: " + inode);
                            tf.updatePos(tf.getPath(), inode, 0);
                        }
                    }
                    tf.setNeedTail(updated);
                }
                tailFiles.put(inode, tf);
                updatedInodes.add(inode);
            }
        }
        return updatedInodes;
    }

对于新生成的文件不再添加路径是否相同条件

编译成jar包,替换掉flume/lib目录下的flume-taildir-source-1.9.0.jar版本可能不同

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值