我想用flink流处理文件,其中两行属于一起。第一行是标题,第二行是相应的文本。
这些文件位于 local文件系统中。我使用readFile(fileInputFormat,path,watchType,interval,pathFilter,typeInfo)方法和一个自定义的fileInputFormat。
我的 streaming job类如下所示:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream inputStream = env.readFile(new ReadInputFormatTest("path/to/monitored/folder"), "path/to/monitored/folder", FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
inputStream.print();
env.execute("Flink Streaming Java API Skeleton");
我的ReadInputFormatTest是这样的:
public class ReadInputFormatTest extends FileInputFormat {
private transient FileSystem fileSystem;
private transient BufferedReader reader;
private final String inputPath;
private String headerLine;
private