最近一段时间很忙,顾不上仔细选题,写一个所有人都知道的小知识点吧。
checkpoint和savepoint是Flink为我们提供的作业快照机制,它们都包含有作业状态的持久化副本。官方文档这样描述checkpoint:
Checkpoints make state in Flink fault tolerant by allowing state and the corresponding stream positions to be recovered, thereby giving the application the same semantics as a failure-free execution.
而对savepoint的描述是:
A Savepoint is a consistent image of the execution state of a streaming job, created via Flink’s checkpointing mechanism. You can use Savepoints to stop-and-resume, fork, or update your Flink jobs.
下面这张来自Flink 1.1版本文档(更新的版本就不见了)的图示出了checkpoint和savepoint的关系。
用几句话总结一下。
checkpoint的侧重点是“容错”,即Flink作业意外失败并重启之后,能够直接从早先打下的checkpoint恢复运行,且不影响作业逻辑的准确性。而savepoint的侧重点是“维护”,即Flink作业需要在人工干预下手动重启、升级、迁移或A/B测试时,先将状态整体写入可靠存储,维护完毕之后再从savepoint恢复现场。
savepoint是“通过checkpoint机制”创建的,所以savepoint本