背景
最近在用filesystem connector来写hdfs,底层是通过StreamFileSink实现的。在看官方文档时,有几条注意事项,其中第一条如下:
When using Hadoop < 2.7, please use the OnCheckpointRollingPolicy which rolls part files on every checkpoint. The reason is that if part files “traverse” the checkpoint interval, then, upon recovery from a failure the StreamingFileSink may use the truncate() method of the filesystem to discard uncommitted data from the in-progress file. This method is not supported by pre-2.7 Hadoop versions and Flink will throw an exception.
当使用hadoop小于2.7版本时,请使用OnCheckpointRollingPolicy策略来滚动文件。原因是part file有可能跨越多个Checkpoint,当从失败恢复时,StreamingFileSink
会使用truncate()
方法来丢弃进行中文件当中未提交的部分。只有2.7+版本才支持truncate
方法。
具体什么场景下使用低于2.7的版本会出问题呢,于是做了验证。
验证
SQL任务
通过编译不同版本的flink-hadoop-shaded包来测试,具体如何打包,有时间再开一片单独说明。
经过测试同一个sql任务运行在hadoop 2.6和2.7版本,都可以正常从Checkpoint恢复。
这就有点奇怪了,官网不是说会存在这样的场景吗?为什么sql任务不会有问题?具体原因往下面看。
Streaming任务
写了一个demo任务,代码如下:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.enableCheckpointing(60000);
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "xxx:9092");
properties.setProperty("group.id", "test");
DataStream<String> src = env
.addSource(new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), properties));
//default策略
src.addSink(StreamingFileSink
.forRowFormat(
new Path("hdfs://xxx/zs_test"),
new SimpleStringEncoder<String>("UTF-8"))
.withRollingPolicy(DefaultRollingPolicy.builder().build()).build());
/*Checkpoint策略
src.addSink(StreamingFileSink
.forRowFormat(
new Path("hdfs://xxx/zs_test"),
new SimpleStringEncoder<String>("UTF-8"))
.withRollingPolicy(OnCheckpointRollingPolicy.build()).build());
*/
env.execute("sink to hdfs");
}
Rolling Policy 就是用来决定文件什么时候从临时的变成正式文件(in-progress→finished),有Default 和OnCheckpoint两种。
同时StreamingFileSink支持两种Format,RowFormat和BulkFormat。
先针对RowFormat在两种不同策略下,对不同的hadoop版本的情况进行了测试。*结果是OnCheckpoint策略下2.6和2.7版本都可以正常恢复,Default策略下在2.7版本可以恢复,2.6版本恢复不了。*报错如下:
2020-10-22 16:59:11
java.io.IOException: Problem while truncating file: hdfs://xxxx/zs_test/2020-10-22–16/.part-2-5.inprogress.2848fb32-b428-45ab-8b85-f44f41f56e5d
at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.safelyTruncateFile(HadoopRecoverableFsDataOutputStream.java:167)
at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.(HadoopRecoverableFsDataOutputStream.java:90)
at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.recover(HadoopRecoverableWriter.java:83)
at org.apache.flink.streaming.api.functions.sink.filesystem.OutputStreamBasedPartFileWriter O u t p u t S t r e a m B a s e d B u c k e t W r i t e r . r e s u m e I n P r o g r e s s F i l e F r o m ( O u t p u t S t r e a m B a s e d P a r t F i l e W r i t e r . j a v a : 91 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . B u c k e t . r e s t o r e I n P r o g r e s s F i l e ( B u c k e t . j a v a : 134 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . B u c k e t . < i n i t > ( B u c k e t . j a v a : 121 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . B u c k e t . r e s t o r e ( B u c k e t . j a v a : 379 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . D e f a u l t B u c k e t F a c t o r y I m p l . r e s t o r e B u c k e t ( D e f a u l t B u c k e t F a c t o r y I m p l . j a v a : 63 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . B u c k e t s . h a n d l e R e s t o r e d B u c k e t S t a t e ( B u c k e t s . j a v a : 176 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . B u c k e t s . i n i t i a l i z e A c t i v e B u c k e t s ( B u c k e t s . j a v a : 164 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . B u c k e t s . i n i t i a l i z e S t a t e ( B u c k e t s . j a v a : 148 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . S t r e a m i n g F i l e S i n k H e l p e r . < i n i t > ( S t r e a m i n g F i l e S i n k H e l p e r . j a v a : 74 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . f u n c t i o n s . s i n k . f i l e s y s t e m . S t r e a m i n g F i l e S i n k . i n i t i a l i z e S t a t e ( S t r e a m i n g F i l e S i n k . j a v a : 427 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . u t i l . f u n c t i o n s . S t r e a m i n g F u n c t i o n U t i l s . t r y R e