Flink学习 - 9. Checkpoint使用方式

checkpoint 开启

默认的checkpoint是关闭的,需要使用的使用要优先开启

开启方式:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

// 设置每隔5000ms启动一个checkpoint
env.enableCheckpointing(1000);

checkpoint 模式

默认的checkPointMode是 Exactly-once,可以设置成 AT_LEAST_ONCE;
主要是以上两种模式。

Exactly-once对于大多数应用来说是最合适的。At-least-once用在某些延迟超低的应用程序,对数据准确性要求不高的应用。

checkpointConfig.setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);

flink给出的模式


/**
 * The checkpointing mode defines what consistency guarantees the system gives in the presence of
 * failures.
 *
 * <p>When checkpointing is activated, the data streams are replayed such that lost parts of the
 * processing are repeated. For stateful operations and functions, the checkpointing mode defines
 * whether the system draws checkpoints such that a recovery behaves as if the operators/functions
 * see each record "exactly once" ({@link #EXACTLY_ONCE}), or whether the checkpoints are drawn
 * in a simpler fashion that typically encounters some duplicates upon recovery
 * ({@link #AT_LEAST_ONCE})</p>
 */
@Public
public enum CheckpointingMode {

	/**
	 * Sets the checkpointing mode to "exactly once". This mode means that the system will
	 * checkpoint the operator and user function state in such a way that, upon recovery,
	 * every record will be reflected exactly once in the operator state.
	 *
	 * <p>For example, if a user function counts the number of elements in a stream,
	 * this number will consistently be equal to the number of actual elements in the stream,
	 * regardless of failures and recovery.</p>
	 *
	 * <p>Note that this does not mean that each record flows through the streaming data flow
	 * only once. It means that upon recovery, the state of operators/functions is restored such
	 * that the resumed data streams pick up exactly at after the last modification to the state.</p>
	 *
	 * <p>Note that this mode does not guarantee exactly-once behavior in the interaction with
	 * external systems (only state in Flink's operators and user functions). The reason for that
	 * is that a certain level of "collaboration" is required between two systems to achieve
	 * exactly-once guarantees. However, for certain systems, connectors can be written that facilitate
	 * this collaboration.</p>
	 *
	 * <p>This mode sustains high throughput. Depending on the data flow graph and operations,
	 * this mode may increase the record latency, because operators need to align their input
	 * streams, in order to create a consistent snapshot point. The latency increase for simple
	 * dataflows (no repartitioning) is negligible. For simple dataflows with repartitioning, the average
	 * latency remains small, but the slowest records typically have an increased latency.</p>
	 */
	EXACTLY_ONCE,

	/**
	 * Sets the checkpointing mode to "at least once". This mode means that the system will
	 * checkpoint the operator and user function state in a simpler way. Upon failure and recovery,
	 * some records may be reflected multiple times in the operator state.
	 *
	 * <p>For example, if a user function counts the number of elements in a stream,
	 * this number will equal to, or larger, than the actual number of elements in the stream,
	 * in the presence of failure and recovery.</p>
	 *
	 * <p>This mode has minimal impact on latency and may be preferable in very-low latency
	 * scenarios, where a sustained very-low latency (such as few milliseconds) is needed,
	 * and where occasional duplicate messages (on recovery) do not matter.</p>
	 */
	AT_LEAST_ONCE
}

checkpoint存储位置

  • FsStateBackend
  • MemoryStateBackend
  • RocksDBStateBackend

单任务设置:

// 设置statebackend,即checkpoint存储位置
env.setStateBackend(new FsStateBackend("hdfs://localhost:9000/jerome/flink/checkpoint"));

全局设置:

修改flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: hdfs://namenode:9000/flink/checkpoints
注意:state.backend的值可以是下面几种:jobmanager(MemoryStateBackend), filesystem(FsStateBackend), rocksdb(RocksDBStateBackend)

实例代码


package com.jerome.flink.checkpoint;

import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

/**
 * This is Description
 *
 * @author Jerome丶子木
 * @date 2020/01/10
 */
public class CheckpointTest {

    public static void main(String[] args) throws Exception {

        String hostname;
        int port;

        try{
            ParameterTool parameterTool = ParameterTool.fromArgs(args);
            hostname = parameterTool.get("hostname");
            port = parameterTool.getInt("port");
        }catch (Exception e){
            e.printStackTrace();
            hostname = "localhost";
            port = 9010;
        }


        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 设置每隔5000ms启动一个checkpoint
        env.enableCheckpointing(1000);
        // 设置statebackend,即checkpoint存储位置
        env.setStateBackend(new FsStateBackend("hdfs://localhost:9000/jerome/flink/checkpoint"));
        // 获取checkpoint配置参数
        CheckpointConfig checkpointConfig = env.getCheckpointConfig();
        // 确保checkpoint之间至少有500ms的时间间隔,即checkpoint的最小间隔
        checkpointConfig.setMinPauseBetweenCheckpoints(500);
        // 设置checkpoint超时时间,超过时间则会被丢弃
        checkpointConfig.setCheckpointTimeout(60000);
        // 同一时间只允许进行一个checkpoint
        checkpointConfig.setMaxConcurrentCheckpoints(1);
        // 一旦Flink程序被cancel后,会保留checkpoint数据,
        // ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION:表示一旦Flink处理程序被cancel后,会保留Checkpoint数据,以便根据实际需要恢复到指定的Checkpoint
        // ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION:表示一旦Flink处理程序被cancel后,会删除Checkpoint数据,只有job执行失败的时候才会保存checkpoint
        checkpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
        checkpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

        String delimiter = "\n";

        DataStreamSource<String> stringDataStreamSource = env.socketTextStream(hostname, port, delimiter);

        SingleOutputStreamOperator<WordWithCount> windowCount = stringDataStreamSource.flatMap(new RichFlatMapFunction<String, WordWithCount>() {
            @Override
            public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
                String[] splits = value.split("\\s");
                for (String word : splits) {
                    out.collect(new WordWithCount(word, 1));
                }
            }
        }).keyBy("word")
                //指定时间窗口时间是2秒,指定时间间隔是1秒
                .timeWindow(Time.seconds(2), Time.seconds(1))
                .sum("count");
                /*.reduce(new ReduceFunction<WordWithCount>() {
                    @Override
                    public WordWithCount reduce(WordWithCount value1, WordWithCount value2) throws Exception {
                        return new WordWithCount(value1.word,value1.count + value2.count);
                    }
                })*/

                // 把数据打印到控制台并且设置并行度
        windowCount.print().setParallelism(1);

        env.execute("Socket window count");


    }

    public static class WordWithCount{
        public String word;
        public long count;
        public WordWithCount(){}

        public WordWithCount(String word, long count){
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString(){
            StringBuilder res = new StringBuilder();
            res.append("WordWithCount{ word = '");
            res.append(word);
            res.append("' , count= ");
            res.append(count);
            res.append("} ;");
            return res.toString();
        }
    }

}

启动程序碰到的问题

从idea本地调试过程中,直接执行碰到错误:

Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Could not retrieve JobResult.
	at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:622)
	at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:117)
	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
	at com.jerome.flink.checkpoint.CheckpointTest.main(CheckpointTest.java:84)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$2(Dispatcher.java:333)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
	... 6 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152)
	at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83)
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375)
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
	... 7 more
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side.
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:255)
	at org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:594)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:340)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:106)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.createExecutionGraph(LegacyScheduler.java:207)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:184)
	at org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176)
	at org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70)
	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275)
	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265)
	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146)
	... 10 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:447)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:359)
	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
	at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:61)
	at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:490)
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:253)
	... 22 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
	at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:443)
	... 27 more

分析得知是缺少连接hadoop的依赖,需要在pom中添加依赖:

<!-- flink连接hadoop相关依赖 -->
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-shaded-hadoop2</artifactId>
	<version>1.2.0</version>
</dependency>

以上是本地测试,如果打包在集群中运行时报错,则可以在flink官网下载对应的jar:

然后放到flink对应的依赖目录:

再次执行即可。

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
`flink.checkpoint.timeout` 和 `flink.checkpoint.interval` 是 Flink 中与检查点相关的两个参数,它们之间存在一定的关系。 - `flink.checkpoint.timeout` 参数定义了执行检查点的超时时间,即当执行检查点操作时,如果超过了指定的超时时间仍未完成,则会被视为失败。 - `flink.checkpoint.interval` 参数定义了两次检查点之间的时间间隔,即多久执行一次检查点。 这两个参数的关系可以通过以下几点来说明: 1. `flink.checkpoint.timeout` 应该大于等于 `flink.checkpoint.interval`。确保超时时间足够长以容纳一个完整的检查点操作,否则可能会导致检查点失败。 2. 如果 `flink.checkpoint.timeout` 被设置得过小,可能会导致检查点操作在超时之前无法完成。在这种情况下,可以适当增加 `flink.checkpoint.timeout` 的值,以便给检查点操作足够的时间来完成。 3. 如果 `flink.checkpoint.interval` 被设置得过小,系统将更频繁地进行检查点操作,从而导致更高的系统开销和资源消耗。因此,在设置 `flink.checkpoint.interval` 时需要综合考虑系统的性能要求和资源限制。 需要根据应用程序的实际情况和需求来评估和调整 `flink.checkpoint.timeout` 和 `flink.checkpoint.interval` 的值。同时,还应该考虑 Flink 集群的配置和硬件资源是否能够支持所选的超时时间和间隔。在设置之后,建议进行性能测试和实际生产环境的实验来验证和优化这两个参数的值。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值