Flink重启策略
1.Fix Delay Restart Strategy
固定延时重启策略按照给定的次数尝试重启作业。 如果尝试超过了给定的最大次数,作业将最终失败。 在连续的两次重启尝试之间,重启策略等待一段固定长度的时间。
通过在 flink-conf.yaml
中设置如下配置参数,默认启用此策略。
restart-strategy: fixed-delay
Key | Default | Type | Description |
---|---|---|---|
restart-strategy.fixed-delay.attempts | 1 | Integer | The number of times that Flink retries the execution before the job is declared as failed if restart-strategy has been set to fixed-delay . |
restart-strategy.fixed-delay.delay | 1 s | Duration | Delay between two consecutive restart attempts if restart-strategy has been set to fixed-delay . Delaying the retries can be helpful when the program interacts with external systems where for example connections or pending transactions should reach a timeout before re-execution is attempted. It can be specified using notation: “1 min”, “20 s” |
例如:
restart-strategy.fixed-delay.attempts: 3
restart-strategy.fixed-delay.delay: 10 s
固定延迟重启策略也可以在程序中设置:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(
3, // 尝试重启的次数
Time.of(10, TimeUnit.SECONDS) // 延时
));
2.Failure Rate Restart Strategy
故障率重启策略在故障发生之后重启作业,但是当故障率(每个时间间隔发生故障的次数)超过设定的限制时,作业会最终失败。 在连续的两次重启尝试之间,重启策略等待一段固定长度的时间。
通过在 flink-conf.yaml
中设置如下配置参数,默认启用此策略。
restart-strategy: failure-rate
如下是参数解释:
Key | Default | Type | Description |
---|---|---|---|
restart-strategy.failure-rate.delay | 1 s | Duration | Delay between two consecutive restart attempts if restart-strategy has been set to failure-rate . It can be specified using notation: “1 min”, “20 s” |
restart-strategy.failure-rate.failure-rate-interval | 1 min | Duration | Time interval for measuring failure rate if restart-strategy has been set to failure-rate . It can be specified using notation: “1 min”, “20 s” |
restart-strategy.failure-rate.max-failures-per-interval | 1 | Integer | Maximum number of restarts in given time interval before failing a job if restart-strategy has been set to failure-rate . |
例如:
restart-strategy.failure-rate.max-failures-per-interval: 3
restart-strategy.failure-rate.failure-rate-interval: 5 min
restart-strategy.failure-rate.delay: 10 s
故障率重启策略也可以在程序中设置:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(RestartStrategies.failureRateRestart(
3, // 每个时间间隔的最大故障次数
Time.of(5, TimeUnit.MINUTES), // 测量故障率的时间间隔
Time.of(10, TimeUnit.SECONDS) // 延时
));
3.No Restart Strategy
作业直接失败,不尝试重启。
restart-strategy: none
不重启策略也可以在程序中设置:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(RestartStrategies.noRestart());
4.Fallback Restart Strategy
使用群集定义的重启策略。 这对于启用了 checkpoint 的流处理程序很有帮助。如果没有定义其他的重启策略,默认采用固定延时重启策略。