Flink官方文档笔记15 通过状态快照实现容错机制

Fault Tolerance via State Snapshots

State Backends 状态后端

The keyed state managed by Flink is a sort of sharded, key/value store, and the working copy of each item of keyed state is kept somewhere local to the taskmanager responsible for that key.
Flink管理的键控状态是一种切分的键/值存储,每个键控状态项的工作副本保存在本地负责该键的taskmanager的某个地方。

Operator state is also local to the machine(s) that need(s) it.
操作符状态对于需要它的机器也是本地的。

Flink periodically takes persistent snapshots of all the state and copies these snapshots somewhere more durable, such as a distributed file system.
Flink定期获取所有状态的持久快照,并将这些快照复制到更持久的地方,比如分布式文件系统。

In the event of the failure, Flink can restore the complete state of your application and resume processing as though nothing had gone wrong.
在失败的情况下,Flink可以恢复应用程序的完整状态,并继续处理,就像没有发生任何错误一样。

This state that Flink manages is stored in a state backend.
Flink管理的这种状态存储在状态后端。

Two implementations of state backends are available – one based on RocksDB, an embedded key/value store that keeps its working state on disk, and another heap-based state backend that keeps its working state in memory, on the Java heap.
状态后端有两种实现——一种基于RocksDB,它是一种将其工作状态保存在磁盘上的嵌入式键/值存储,另一种基于堆的状态后端将其工作状态保存在内存中,保存在Java堆上。

This heap-based state backend comes in two flavors: the FsStateBackend that persists its state snapshots to a distributed file system, and the MemoryStateBackend that uses the JobManager’s heap.
这种基于堆的状态后端有两种类型:fsstate后端将其状态快照保存到分布式文件系统,以及memorystate后端使用JobManager的堆。

在这里插入图片描述

When working with state kept in a heap-based state backend, accesses and updates involve reading and writing objects on the heap.
当使用保存在基于堆的状态后端中的状态时,访问和更新涉及对堆上的对象进行读写。

But for objects kept in the RocksDBStateBackend, accesses and updates involve serialization and deserialization, and so are much more expensive.
但是对于保存在rocksdbstate后端的对象,访问和更新涉及到序列化和反序列化,因此开销要大得多。

But the amount of state you can have with RocksDB is limited only by the size of the local disk.
但是您可以使用RocksDB的状态量仅受本地磁盘大小的限制(不受内存大小限制)。

Note also that only the RocksDBStateBackend is able to do incremental snapshotting, which is a significant benefit for applications with large amounts of slowly changing state.
还要注意的是,只有rocksdbstate后端能够进行增量快照,这对于具有大量缓慢变化状态的应用程序来说是一个显著的好处。

All of these state backends are able to do asynchronous snapshotting, meaning that they can take a snapshot without impeding the ongoing stream processing.
所有这些状态后端都能够进行异步快照,这意味着它们可以在不阻碍流处理的情况下进行快照。

State Snapshots 状态快照

Definitions定义

  • Snapshot – a generic term referring to a global, consistent image of the state of a Flink job.
    一个通用术语,指Flink作业状态的全局一致映像。
    A snapshot includes a pointer into each of the data sources (e.g., an offset into a file or Kafka partition), as well as a copy of the state from each of the job’s stateful operators that resulted from having processed all of the events up to those positions in the sources.
    快照包含一个指向每个数据源(例如,一个偏移量到一个文件或卡夫卡分区)以及每个作业的有状态操作符的状态副本,这些操作符是在处理了源中的这些位置之前的所有事件后产生的。

  • Checkpoint – a snapshot taken automatically by Flink for the purpose of being able to recover from faults. Checkpoints can be incremental, and are optimized for being restored quickly.
    由Flink自动捕获的快照,以便能够从错误中恢复。检查点可以是递增的,并且经过优化可以快速恢复。

  • Externalized Checkpoint – normally checkpoints are not intended to be manipulated by users. Flink retains only the n-most-recent checkpoints (n being configurable) while a job is running, and deletes them when a job is cancelled.
    通常检查点不打算由用户操作。当作业运行时,Flink只保留n个最近的检查点(n是可配置的),并在作业被取消时删除它们。
    But you can configure them to be retained instead, in which case you can manually resume from them.
    但您可以将它们配置为保留,在这种情况下,您可以从它们手动恢复。

  • Savepoint – a snapshot triggered manually by a user (or an API call) for some operational purpose, such as a stateful redeploy/upgrade/rescaling operation.
    由用户(或API调用)出于某种操作目的手动触发的快照,例如有状态的重新部署/升级/重新缩放操作。

Savepoints are always complete, and are optimized for operational flexibility.
保存点总是完整的,并且针对操作灵活性进行了优化。

How does State Snapshotting Work? 状态快照的运行原理?

Flink uses a variant of the Chandy-Lamport algorithm known as asynchronous barrier snapshotting.
Flink使用Chandy-Lamport算法的一种变体,称为异步屏障快照。

When a task manager is instructed by the checkpoint coordinator (part of the job manager) to begin a checkpoint, it has all of the sources record their offsets and insert numbered checkpoint barriers into their streams.

当检查点协调器(作业管理器的一部分)指示任务管理器开始检查点时,它会让所有源记录它们的偏移量,并将编号的检查点屏障插入到它们的流中。

These barriers flow through the job graph, indicating the part of the stream before and after each checkpoint.
这些障碍流经作业图,指示流在每个检查点之前和之后的部分。
在这里插入图片描述
Checkpoint n will contain the state of each operator that resulted from having consumed every event before checkpoint barrier n, and none of the events after it.
检查点n将包含每个操作符的状态,这些操作符使用了检查点屏障n之前的所有事件,而没有使用检查点屏障n之后的事件。

As each operator in the job graph receives one of these barriers, it records its state.
当作业图中的每个操作符接收到其中一个屏障时,它将记录其状态。

Operators with two input streams (such as a CoProcessFunction) perform barrier alignment so that the snapshot will reflect the state resulting from consuming events from both input streams up to (but not past) both barriers.
具有两个输入流(例如一个CoProcessFunction)的操作符执行屏障对齐,以便快照将反映从两个输入流消费事件到(但不是超过)两个屏障所产生的状态。
在这里插入图片描述
Flink’s state backends use a copy-on-write mechanism to allow stream processing to continue unimpeded while older versions of the state are being asynchronously snapshotted.
Flink的状态后端使用了一种写时复制的机制,允许流处理在状态的旧版本被异步快照时不受阻碍地继续进行。

Only when the snapshots have been durably persisted will these older versions of the state be garbage collected.
只有当快照被持久持久化时,这些旧版本的状态才会被垃圾收集。

Exactly Once Guarantees 保证数据只被处理一次

When things go wrong in a stream processing application, it is possible to have either lost, or duplicated results.
当流处理应用程序出现错误时,可能会丢失或重复结果。
With Flink, depending on the choices you make for your application and the cluster you run it on, any of these outcomes is possible:
使用Flink,根据您对应用程序和运行它的集群所做的选择,可能出现以下任何一种结果:

  • Flink makes no effort to recover from failures (at most once)
    Flink不努力从故障中恢复(最多一次)
  • Nothing is lost, but you may experience duplicated results (at least once)
    没有什么损失,但是您可能会遇到重复的结果(至少一次)
  • Nothing is lost or duplicated (exactly once)
    没有丢失或复制(只有一次)

Given that Flink recovers from faults by rewinding and replaying the source data streams, when the ideal situation is described as exactly once this does not mean that every event will be processed exactly once.
考虑到Flink通过重放和重放源数据流从错误中恢复,当理想的情况被描述为精确一次时,这并不意味着每个事件将被精确处理一次。

Instead, it means that every event will affect the state being managed by Flink exactly once.
相反,它意味着每个事件只会影响Flink管理的状态一次

Barrier alignment is only needed for providing exactly once guarantees.
屏障对齐只需要提供准确的一次保证。

If you don’t need this, you can gain some performance by configuring Flink to use CheckpointingMode.AT_LEAST_ONCE, which has the effect of disabling barrier alignment.
如果您不需要这个,您可以通过配置Flink使用 CheckpointingMode.AT_LEAST_ONCE 来获得一些性能,它具有禁用屏障对齐的效果。

Exactly Once End-to-end 端到端Exactly Once

To achieve exactly once end-to-end, so that every event from the sources affects the sinks exactly once, the following must be true:
为了准确地实现一次端到端,从而使来自源的每个事件准确地影响接收一次,必须满足以下条件:

  • your sources must be replayable, and你的资源必须是可重放的
  • your sinks must be transactional (or idempotent)您的接收必须是事务性的(或幂等的)

Hands-on 练习

The Flink Operations Playground includes a section on Observing Failure & Recovery.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值