Flink学习 - 9. Checkpoint使用方式

最新推荐文章于 2024-04-17 00:15:14 发布

Jerome丶子木

最新推荐文章于 2024-04-17 00:15:14 发布

阅读量3k

点赞数 2

分类专栏： Flink 文章标签： flink

本文链接：https://blog.csdn.net/jerome520zl/article/details/103960133

版权

Flink 专栏收录该内容

10 篇文章 4 订阅

订阅专栏

Flink学习 - 9. Checkpoint使用方式

checkpoint 开启
checkpoint 模式
checkpoint存储位置
实例代码
启动程序碰到的问题

checkpoint 开启

默认的checkpoint是关闭的，需要使用的使用要优先开启

开启方式：

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

// 设置每隔5000ms启动一个checkpoint
env.enableCheckpointing(1000);

checkpoint 模式

默认的checkPointMode是 Exactly-once,可以设置成 AT_LEAST_ONCE;
主要是以上两种模式。

Exactly-once对于大多数应用来说是最合适的。At-least-once用在某些延迟超低的应用程序，对数据准确性要求不高的应用。

checkpointConfig.setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);

flink给出的模式


/**
 * The checkpointing mode defines what consistency guarantees the system gives in the presence of
 * failures.
 *
 * <p>When checkpointing is activated, the data streams are replayed such that lost parts of the
 * processing are repeated. For stateful operations and functions, the checkpointing mode defines
 * whether the system draws checkpoints such that a recovery behaves as if the operators/functions
 * see each record "exactly once" ({@link #EXACTLY_ONCE}), or whether the checkpoints are drawn
 * in a simpler fashion that typically encounters some duplicates upon recovery
 * ({@link #AT_LEAST_ONCE})</p>
 */
@Public
public enum CheckpointingMode {

	/**
	 * Sets the checkpointing mode to "exactly once". This mode means that the system will
	 * checkpoint the operator and user function state in such a way that, upon recovery,
	 * every record will be reflected exactly once in the operator state.
	 *
	 * <p>For example, if a user function counts the number of elements in a stream,
	 * this number will consistently be equal to the number of actual elements in the stream,
	 * regardless of failures and recovery.</p>
	 *
	 * <p>Note that this does not mean that each record flows through the streaming data flow
	 * only once. It means that upon recovery, the state of operators/functions is restored such
	 * that the resumed data streams pick up exactly at after the last modification to the state.</p>
	 *
	 * <p>Note that this mode does not guarantee exactly-once behavior in the interaction with
	 * external systems (only state in Flink's operators and user functions). The reason for that
	 * is that a certain level of "collaboration" is required between two systems to achieve
	 * exactly-once guarantees. However, for certain systems, connectors can be written that facilitate
	 * this collaboration.</p>
	 *
	 * <p>This mode sustains high throughput. Depending on the data flow graph and operations,
	 * this mode may increase the record latency, because operators need to align their input
	 * streams, in order to create a consistent snapshot point. The latency increase for simple
	 * dataflows (no repartitioning) is negligible. For simple dataflows with repartitioning, the average
	 * latency remains small, but the slowest records typically have an increased latency.</p>
	 */
	EXACTLY_ONCE,

	/**
	 * Sets the checkpointing mode to "at least once". This mode means that the system will
	 * checkpoint the operator and user function state in a simpler way. Upon failure and recovery,
	 * some records may be reflected multiple times in the operator state.
	 *
	 * <p>For example, if a user function counts the number of elements in a stream,
	 * this number will equal to, or larger, than the actual number of elements in the stream,
	 * in the presence of failure and recovery.</p>
	 *
	 * <p>This mode has minimal impact on latency and may be preferable in very-low latency
	 * scenarios, where a sustained very-low latency (such as few milliseconds) is needed,
	 * and where occasional duplicate messages (on recovery) do not matter.</p>
	 */
	AT_LEAST_ONCE
}

checkpoint存储位置

FsStateBackend
MemoryStateBackend
RocksDBStateBackend

单任务设置：

// 设置statebackend，即checkpoint存储位置
env.setStateBackend(new FsStateBackend("hdfs://localhost:9000/jerome/flink/checkpoint"));

全局设置：

修改flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: hdfs://namenode:9000/flink/checkpoints
注意：state.backend的值可以是下面几种：jobmanager(MemoryStateBackend), filesystem(FsStateBackend), rocksdb(RocksDBStateBackend)

实例代码


package com.jerome.flink.checkpoint;

import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

/**
 * This is Description
 *
 * @author Jerome丶子木
 * @date 2020/01/10
 */
public class CheckpointTest {

    public static void main(String[] args) throws Exception {

        String hostname;
        int port;

        try{
            ParameterTool parameterTool = ParameterTool.fromArgs(args);
            hostname = parameterTool.get("hostname");
            port = parameterTool.getInt("port");
        }catch (Exception e){
            e.printStackTrace();
            hostname = "localhost";
            port = 9010;
        }


        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 设置每隔5000ms启动一个checkpoint
        env.enableCheckpointing(1000);
        // 设置statebackend，即checkpoint存储位置
        env.setStateBackend(new FsStateBackend("hdfs://localhost:9000/jerome/flink/checkpoint"));
        // 获取checkpoint配置参数
        CheckpointConfig checkpointConfig = env.getCheckpointConfig();
        // 确保checkpoint之间至少有500ms的时间间隔，即checkpoint的最小间隔
        checkpointConfig.setMinPauseBetweenCheckpoints(500);
        // 设置checkpoint超时时间，超过时间则会被丢弃
        checkpointConfig.setCheckpointTimeout(60000);
        // 同一时间只允许进行一个checkpoint
        checkpointConfig.setMaxConcurrentCheckpoints(1);
        // 一旦Flink程序被cancel后，会保留checkpoint数据，
        // ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION:表示一旦Flink处理程序被cancel后，会保留Checkpoint数据，以便根据实际需要恢复到指定的Checkpoint
        // ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION:表示一旦Flink处理程序被cancel后，会删除Checkpoint数据，只有job执行失败的时候才会保存checkpoint
        checkpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
        checkpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

        String delimiter = "\n";

        DataStreamSource<String> stringDataStreamSource = env.socketTextStream(hostname, port, delimiter);

        SingleOutputStreamOperator<WordWithCount> windowCount = stringDataStreamSource.flatMap(new RichFlatMapFunction<String, WordWithCount>() {
            @Override
            public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
                String[] splits = value.split("\\s");
                for (String word : splits) {
                    out.collect(new WordWithCount(word, 1));
                }
            }
        }).keyBy("word")
                //指定时间窗口时间是2秒，指定时间间隔是1秒
                .timeWindow(Time.seconds(2), Time.seconds(1))
                .sum("count");
                /*.reduce(new ReduceFunction<WordWithCount>() {
                    @Override
                    public WordWithCount reduce(WordWithCount value1, WordWithCount value2) throws Exception {
                        return new WordWithCount(value1.word,value1.count + value2.count);
                    }
                })*/

                // 把数据打印到控制台并且设置并行度
        windowCount.print().setParallelism(1);

        env.execute("Socket window count");


    }

    public static class WordWithCount{
        public String word;
        public long count;
        public WordWithCount(){}

        public WordWithCount(String word, long count){
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString(){
            StringBuilder res = new StringBuilder();
            res.append("WordWithCount{ word = '");
            res.append(word);
            res.append("' , count= ");
            res.append(count);
            res.append("} ;");
            return res.toString();
        }
    }

}

启动程序碰到的问题

从idea本地调试过程中，直接执行碰到错误：

Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Could not retrieve JobResult.
	at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:622)
	at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:117)
	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
	at com.jerome.flink.checkpoint.CheckpointTest.main(CheckpointTest.java:84)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$2(Dispatcher.java:333)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
	... 6 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152)
	at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83)
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375)
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
	... 7 more
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side.
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:255)
	at org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:594)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:340)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:106)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.createExecutionGraph(LegacyScheduler.java:207)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:184)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176)
	at org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70)
	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275)
	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265)
	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146)
	... 10 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:447)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:359)
	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
	at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:61)
	at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:490)
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:253)
	... 22 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
	at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:443)
	... 27 more

分析得知是缺少连接hadoop的依赖，需要在pom中添加依赖：

<!-- flink连接hadoop相关依赖 -->
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-shaded-hadoop2</artifactId>
	<version>1.2.0</version>
</dependency>

以上是本地测试，如果打包在集群中运行时报错，则可以在flink官网下载对应的jar：

然后放到flink对应的依赖目录：

再次执行即可。

Jerome丶子木

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Flink学习 - 9. Checkpoint使用方式

Flink学习 - 9. Checkpoint使用方式checkpoint 开启checkpoint 模式checkpoint存储位置实例代码启动程序碰到的问题checkpoint 开启默认的checkpoint是关闭的，需要使用的使用要优先开启开启方式：StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecu...
复制链接

扫一扫