Flume二次开发之Taildir Source实现递归读取目录下文件

下载源码包

flume-ng-1.6.0-cdh5.7.0-src.tar.gz

解压导入IDEA

在这里插入图片描述
找到我们需要修改的getMatchFiles方法

在这里插入图片描述

 /**
   * 修改flume源码,使其支持递归
   * @param parentDir
   * @param fileNamePattern
   * @return
   */
  private List<File> getMatchFiles(File parentDir, final Pattern fileNamePattern) {
    //所有指定文件夹下的所有文件,在通过正则匹配规则过滤不符合条件的文件
    List<File> result = Lists.newArrayList();
    for(File f: getAllFiles(parentDir)){
      String fileName = f.getName();
      if (fileNamePattern.matcher(fileName).matches()) {
        result.add(f);
      }
    }
    Collections.sort(result, new TailFile.CompareByLastModifiedTime());

    return result;
  }


  /**
   * 新增方法
   * 获取指定目录下的所有文件,通过递归的方式
   * @param parentDir
   * @return
   */
  private List<File> getAllFiles(File parentDir){
    List<File> fileList = Lists.newArrayList();
    getAllFiles(parentDir,fileList);
    return fileList;
  }

  /**
   * 新增方法
   */
  private void getAllFiles(File parentDir,List<File> fileList){
    File[] files = parentDir.listFiles();
    if(null != files){
      for(File file: parentDir.listFiles()){
        if(file.isDirectory()){
          getAllFiles(file,fileList);
        }else{
          fileList.add(file);
        }
      }
    }
  }

上传到服务器,编译

把这个类ReliableTaildirEventReader上传到该路径下替换

[hadoop@hadoop001 taildir]$ ll
total 36
-rw-rw-r-- 1 hadoop hadoop 11411 Mar 24  2016 ReliableTaildirEventReader.java
-rw-rw-r-- 1 hadoop hadoop  2418 Mar 24  2016 TaildirSourceConfigurationConstants.java
-rw-rw-r-- 1 hadoop hadoop 12027 Mar 24  2016 TaildirSource.java
-rw-rw-r-- 1 hadoop hadoop  5129 Mar 24  2016 TailFile.java
[hadoop@hadoop001 taildir]$ pwd
/home/hadoop/source/flume-ng-1.6.0-cdh5.7.0/flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir
[hadoop@hadoop001 taildir]$ 
[hadoop@hadoop001 flume-taildir-source]$ pwd
/home/hadoop/source/flume-ng-1.6.0-cdh5.7.0/flume-ng-sources/flume-taildir-source
[hadoop@hadoop001 flume-taildir-source]$ mvn clean package
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ flume-taildir-source ---
[INFO] Building jar: /home/hadoop/source/flume-ng-1.6.0-cdh5.7.0/flume-ng-sources/flume-taildir-source/target/flume-taildir-source-1.6.0-cdh5.7.0.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:33 min
[INFO] Finished at: 2020-05-20T19:57:29+08:00
[INFO] Final Memory: 37M/981M
[INFO] ------------------------------------------------------------------------
[hadoop@hadoop001 flume-taildir-source]$ ll
total 4
-rw-rw-r-- 1 hadoop hadoop 1970 Mar 24  2016 pom.xml
drwxrwxr-x 4 hadoop hadoop   30 Mar 24  2016 src
drwxrwxr-x 8 hadoop hadoop  212 May 20 19:57 target
[hadoop@hadoop001 flume-taildir-source]$ cd target/
[hadoop@hadoop001 target]$ ll
total 36
drwxrwxr-x 4 hadoop hadoop    33 May 20 19:55 classes
-rw-rw-r-- 1 hadoop hadoop 31327 May 20 19:57 flume-taildir-source-1.6.0-cdh5.7.0.jar
drwxrwxr-x 4 hadoop hadoop    49 May 20 19:55 generated-sources
drwxrwxr-x 2 hadoop hadoop    28 May 20 19:57 maven-archiver
drwxrwxr-x 3 hadoop hadoop    22 May 20 19:55 maven-shared-archive-resources
drwxrwxr-x 2 hadoop hadoop  4096 May 20 19:57 surefire-reports
drwxrwxr-x 4 hadoop hadoop    33 May 20 19:56 test-classes
[hadoop@hadoop001 target]$ 

把该目录下的flume-taildir-source-1.6.0-cdh5.7.0.jar包复制到Flume应用程序的lib目录下

[hadoop@hadoop001 target]$ cp flume-taildir-source-1.6.0-cdh5.7.0.jar  ~/app/apache-flume-1.6.0-cdh5.7.0-bin/lib/

新建conf文件,测试TaildirSource

我们这里直接sink到HDFS上

# example.conf: A single-node Flume configuration

# Name the components on this agent
taildir-hdfs-agent.sources = taildir-source
taildir-hdfs-agent.sinks = hdfs-sink
taildir-hdfs-agent.channels = memory-channel

# Describe/configure the source
taildir-hdfs-agent.sources.taildir-source.type = TAILDIR
taildir-hdfs-agent.sources.taildir-source.filegroups = f1
taildir-hdfs-agent.sources.taildir-source.filegroups.f1 = /home/hadoop/data/flume/taildir/input/.*.txt
taildir-hdfs-agent.sources.taildir-source.positionFile = /home/hadoop/data/flume/taildir/taildir_position/taildir_position.json

# Describe the sink
taildir-hdfs-agent.sinks.hdfs-sink.type = hdfs
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/taildir/%Y%m%d%H%M
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.useLocalTimeStamp = true
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.fileType = CompressedStream
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.writeFormat = Text
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.codeC = gzip
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.filePrefix = leo
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.rollInterval = 30
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.rollSize = 100000000
taildir-hdfs-agent.sinks.hdfs-sink.hdfs.rollCount = 0

# Use a channel which buffers events in memory
taildir-hdfs-agent.channels.memory-channel.type = memory
taildir-hdfs-agent.channels.memory-channel.capacity = 1000
taildir-hdfs-agent.channels.memory-channel.transactionCapacity = 100

# Bind the source and sink to the channel
taildir-hdfs-agent.sources.taildir-source.channels = memory-channel
taildir-hdfs-agent.sinks.hdfs-sink.channel = memory-channel

启动flume-agent,测试

flume-ng agent \
--name taildir-hdfs-agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/taildir-hdfs-agent.conf \
-Dflume.root.logger=INFO,console

克隆一个窗口

[hadoop@hadoop001 input]$ mkdir -p /home/hadoop/data/flume/taildir/input/1/2
[hadoop@hadoop001 input]$ echo "hello hadoop" >> /home/hadoop/data/flume/taildir/input/1/2/test.txt
[hadoop@hadoop001 input]$ echo "666" >> /home/hadoop/data/flume/taildir/input/1/test.txt

测试成功

[hadoop@hadoop001 input]$ hdfs dfs -ls /flume/taildir/
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2020-05-20 20:14 /flume/taildir/202005202013
[hadoop@hadoop001 input]$ hdfs dfs -ls /flume/taildir/202005202013
Found 1 items
-rw-r--r--   1 hadoop supergroup         57 2020-05-20 20:14 /flume/taildir/202005202013/leo.1589976812778.gz
[hadoop@hadoop001 input]$ hdfs dfs -text /flume/taildir/202005202013/leo.1589976812778.gz
hello hadoop
666
[hadoop@hadoop001 input]$ 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值