从iceberg的官方文档上可以看到如下介绍:
实例程序中设置了startSnapshotId,介绍说可以从指定的快照版本号开始读取增量的数据。那么笔者的问题来了:
flink-stream如何增量的读取iceberg table?
flink本身肯定没有增量读取iceberg的能力,这是提供框架层的方法,在源码iceberg/flink/src/main/java/org/apache/iceberg/flink/source/中找到了StreamingReaderOperator.java类,继承了flink的AbstractStreamOperator,我们常识从这里入手去读源码。
/**
* The operator that reads the {@link FlinkInputSplit splits} received from the preceding {@link
* StreamingMonitorFunction}. Contrary to the {@link StreamingMonitorFunction} which has a parallelism of 1,
* this operator can have multiple parallelism.
*
* <p>As soon as a split descriptor is received, it is put in a queue, and use {@link MailboxExecutor}
* read the actual data of the split. This architecture allows the separation of the reading thread from the one split
* processing the checkpoint barriers, thus removing any potential back-pressure.
*/
public class StreamingReaderOperator extends AbstractStreamOperator<</