Qos分类
- At Most Once:数据最多被处理一次,这种处理机制存在数据丢失的风险;
- At Least Once:数据最少被处理一次,存在数据重复计算的问题;
- Exactly Once:数据仅被处理一次,计算结果精准,代价是延迟增大;
目前Apache Flink 支持At Least Once和Exactly Once两种Qos。
适用场景
- At Least Once:实时性要求高,结果存在一定误差的场景(可能偏大);
- Exactly Once:结算结果要求精准,容忍一定延迟的场景;
实现原理
- At Least Once:Operator跳过流对齐,接受到数据后就开始计算,计算完成后数据流向下游,这就导致checkpoint n+1的数据可能被划分到checkpoint n中,因此如果中间过程找出现错误,重新恢复时可能出现重复计算,由于没有流对齐,因此实时性好。
When the alignment is skipped, an operator keeps processing all inputs, even after some checkpoint barriers for checkpoint n arrived. That way, the operator also processes elements that belong to checkpoint n+1 before the state snaps