Apache Flink 状态管理教案

State & Fault Tolerance

Flink是一个基于状态计算的流计算服务。Flink将所有的状态分为两大类:keyed stateoperator state.所谓的keyed state指的是Flink底层会给每一个Key绑定若干个类型的状态值,特指操作KeyedStream中所涉及的状态。所谓operator state指的是非keyed stream中所涉及状态称为operator state,所有的operator state会将状态和具体某个操作符进行绑定。无论是keyed state还是operator stateflink将这些状态管理底层分为两种存储形式:Managed StateRaw State

Managed State- 所谓的Managed State,指的是由Flink控制状态存储结构,例如:状态数据结构、数据类型等,由于是Flink自己管理状态,因此Flink可以更好的针对于管理状态做内存的优化和故障恢复。

Raw State - 所谓的Raw state,指的是Flink对状态的信息和结构一无所知,Flink仅仅知道该状态是一些二进制字节数组,需要用户自己完成状态序列化和反序列化。,因此Raw State Flink不能够针对性的做内存优化,也不支持故障状态的恢复。因此在Flink实战项目开发中,几乎不使用Raw State.

All datastream functions can use managed state, but the raw state interfaces can only be used when implementing operators. Using managed state (rather than raw state) is recommended, since with managed state Flink is able to automatically redistribute state when the parallelism is changed, and also do better memory management.

参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/state.html

Managed Keyed State

flink中的managed keyed state接口提供访问不同数据类型状态,这些状态都是和key进行绑定的。这意味着这种状态只能在KeyedStream上使用。flink内建了以下六种类型的state:

类型使用场景方法
ValueState该状态主要用于存储单一状态值。T value()
update(T)
clear()
ListState该状态主要用于存储单集合状态值。add(T)
addAll(List)
update(List)
Iterable get()
clear()
MapState<UK, UV>该状态主要用于存储一个Map集合put(UK, UV)
putAll(Map)
get(UK)
entries()
keys()
values()
clear()
ReducingState该状态主要用于存储单一状态值。该状态会将添加的元素和历史状态自动做运算,调用用户提供的ReduceFunctionadd(T)
T get()
clear()
AggregatingState<IN, OUT>该状态主要用于存储单一状态值。该状态会将添加的元素和历史状态自动做运算,调用用户提供的AggregateFunction,该状态和ReducingState不同点在于数据输入和输出类型可以不一致add(IN)
OUT get()
clear()
FoldingState<T, ACC>该状态主要用于存储单一状态值。该状态会将添加的元素和历史状态自动做运算,调用用户提供的FoldFunction,该状态和ReducingState不同点在于数据输入和中间结果类型可以不一致add(T)
T get()
clear()

It is important to keep in mind that these state objects are only used for interfacing with state. The state is not necessarily stored inside but might reside on disk or somewhere else. The second thing to keep in mind is that the value you get from the state depends on the key of the input element. So the value you get in one invocation of your user function can differ from the value in another invocation if the keys involved are different.

To get a state handle, you have to create a StateDescriptor. This holds the name of the state (as we will see later, you can create several states, and they have to have unique names so that you can reference them), the type of the values that the state holds, and possibly a user-specified function, such as a ReduceFunction. Depending on what type of state you want to retrieve, you create either a ValueStateDescriptor, a ListStateDescriptor, a ReducingStateDescriptor, a FoldingStateDescriptor or a MapStateDescriptor.

State is accessed using the RuntimeContext, so it is only possible in rich functions. Please see here for information about that, but we will also see an example shortly. The RuntimeContext that is available in a RichFunction has these methods for accessing state:

  • ValueState getState(ValueStateDescriptor)
  • ReducingState getReducingState(ReducingStateDescriptor)
  • ListState getListState(ListStateDescriptor)
  • AggregatingState getAggregatingState(AggregatingStateDescriptor)
  • FoldingState getFoldingState(FoldingStateDescriptor)
  • MapState getMapState(MapStateDescriptor)
ValueState
class WordCountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
  var vs:ValueState[Int]=_
    override def open(parameters: Configuration): Unit = {
    //1.创建对应状态描述符
    val vsd = new ValueStateDescriptor[Int]("wordcount", createTypeInformation[Int])
      //2.获取RuntimeContext
      var context: RuntimeContext = getRuntimeContext
        //3.获取指定类型状态
        vs=context.getState(vsd)
  }

  override def map(value: (String, Int)): (String, Int) = {
    //获取历史值
    val historyData = vs.value()
      //更新状态
      vs.update(historyData+value._2)
      //返回最新值
      (value._1,vs.value())
  }
}

object FlinkWordCountValueState {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    //2.创建DataStream - 细化
    val text = env.socketTextStream("CentOS", 9999)

    //3.执行DataStream的转换算子
    val counts = text.flatMap(line=>line.split("\\s+"))
      .map(word=>(word,1))
      .keyBy(0)
      .map(new WordCountMapFunction)

    //4.将计算的结果在控制打印
    counts.print()

    //5.执行流计算任务
    env.execute("Stream WordCount")
  }
}
ListState
class UserVisitedMapFunction extends RichMapFunction[(String,String),(String,String)]{
  var userVisited:ListState[String]=_


  override def open(parameters: Configuration): Unit = {
    //1.创建对应状态描述符
    val lsd = new ListStateDescriptor[String]("userVisited", createTypeInformation[String])
    //2.获取RuntimeContext
    var context: RuntimeContext = getRuntimeContext
    //3.获取指定类型状态
    userVisited=context.getListState(lsd)
  }

  override def map(value: (String, String)): (String, String) = {
    //获取历史值
    var historyData = userVisited.get().asScala.toList
    //更新状态
    historyData = historyData.::(value._2).distinct
    userVisited.update(historyData.asJava)

    //返回最新值
    (value._1,historyData.mkString(" | "))
  }
}
object FlinkUserVisitedListState {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    //2.创建DataStream - 细化 001 zhangsan 电子类 xxxx  001 zhangsan 手机类 xxxx 001 zhangsan 母婴类 xxxx
    val text = env.socketTextStream("CentOS", 9999)

    //3.执行DataStream的转换算子
    val counts = text.map(line=>line.split("\\s+"))
      .map(ts=>(ts(0)+":"+ts(1),ts(2)))
      .keyBy(0)
      .map(new UserVisitedMapFunction)

    //4.将计算的结果在控制打印
    counts.print()

    //5.执行流计算任务
    env.execute("Stream WordCount")
  }
}
MapState
class UserVisitedMapMapFunction extends RichMapFunction[(String,String),(String,String)]{
  var userVisitedMap:MapState[String,Int]=_

  override def open(parameters: Configuration): Unit = {
    //1.创建对应状态描述符
    val msd = new MapStateDescriptor[String,Int]("UserVisitedMap", createTypeInformation[String],createTypeInformation[Int])
    //2.获取RuntimeContext
    var context: RuntimeContext = getRuntimeContext
    //3.获取指定类型状态
    userVisitedMap=context.getMapState(msd)
  }

  override def map(value: (String, String)): (String, String) = {
    var count=0
    if(userVisitedMap.contains(value._2)){
      count=userVisitedMap.get(value._2)
    }
    userVisitedMap.put(value._2,count+1)

   var historyList= userVisitedMap.entries()
                  .asScala
                  .map(entry=> entry.getKey+":"+entry.getValue)
                  .toList
    //返回最新值
    (value._1,historyList.mkString(" | "))
  }
}
object FlinkUserVisitedMapState {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    //2.创建DataStream - 细化 001 zhangsan 电子类 xxxx  001 zhangsan 手机类 xxxx 001 zhangsan 母婴类 xxxx
    val text = env.socketTextStream("CentOS", 9999)

    //3.执行DataStream的转换算子
    val counts = text.map(line=>line.split("\\s+"))
      .map(ts=>(ts(0)+":"+ts(1),ts(2)))
      .keyBy(0)
      .map(new UserVisitedMapMapFunction)

    //4.将计算的结果在控制打印
    counts.print()

    //5.执行流计算任务
    env.execute("Stream WordCount")
  }
}
ReducingState
class WordCountReduceStateMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
  var rs:ReducingState[Int]=_


  override def open(parameters: Configuration): Unit = {
    //1.创建对应状态描述符
    val rsd = new ReducingStateDescriptor[Int]("wordcountReducingStateDescriptor",
      new ReduceFunction[Int](){
        override def reduce(v1: Int, v2: Int): Int = v1+v2
      },createTypeInformation[Int])
    
    //2.获取RuntimeContext
    var context: RuntimeContext = getRuntimeContext
    //3.获取指定类型状态
    rs=context.getReducingState(rsd)
  }

  override def map(value: (String, Int)): (String, Int) = {
    rs.add(value._2)
    //返回最新值
    (value._1,rs.get())
  }
}

object FlinkWordCountReduceState {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    //2.创建DataStream - 细化
    val text = env.socketTextStream("CentOS", 9999)

    //3.执行DataStream的转换算子
    val counts = text.flatMap(line=>line.split("\\s+"))
      .map(word=>(word,1))
      .keyBy(0)
      .map(new WordCountReduceStateMapFunction)

    //4.将计算的结果在控制打印
    counts.print()

    //5.执行流计算任务
    env.execute("Stream WordCount")
  }
}
AggregatingState
class UserOrderAggregatingStateMapFunction extends RichMapFunction[(String,Double),(String,Double)]{
  var as:AggregatingState[Double,Double]=_


  override def open(parameters: Configuration): Unit = {
    //1.创建对应状态描述符
    val asd = new AggregatingStateDescriptor[Double,(Int,Double),Double]("userOrderAggregatingStateMapFunction",
      new AggregateFunction[Double,(Int,Double),Double](){
        override def createAccumulator(): (Int, Double) = (0,0.0)

        override def add(value: Double, accumulator: (Int, Double)): (Int, Double) = {
          (accumulator._1+1,accumulator._2+value)
        }

        override def getResult(accumulator: (Int, Double)): Double = {
          accumulator._2/accumulator._1
        }

        override def merge(a: (Int, Double), b: (Int, Double)): (Int, Double) = {
          (a._1+b._1,a._2+b._2)
        }
      },createTypeInformation[(Int,Double)])

    //2.获取RuntimeContext
    var context: RuntimeContext = getRuntimeContext
    //3.获取指定类型状态
    as=context.getAggregatingState(asd)
  }

  override def map(value: (String, Double)): (String, Double) = {
     as.add(value._2)
    //返回最新值
    (value._1,as.get())
  }
}
object FlinkUserOrderAggregatingState {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    //2.创建DataStream - 细化 001 zhangsan 1000
    val text = env.socketTextStream("CentOS", 9999)

    //3.执行DataStream的转换算子
    val counts = text.map(line=>line.split("\\s+"))
      .map(ts=>(ts(0)+":"+ts(1),ts(2).toDouble))
      .keyBy(0)
      .map(new UserOrderAggregatingStateMapFunction)

    //4.将计算的结果在控制打印
    counts.print()

    //5.执行流计算任务
    env.execute("Stream WordCount")
  }
}
FoldingState
class UserOrderAvgMapFunction extends RichMapFunction[(String,Double),(String,Double)]{
    var rs:ReducingState[Int]=_
    var fs:FoldingState[Double,Double]=_

    override def open(parameters: Configuration): Unit = {
      //1.创建对应状态描述符
      val rsd = new ReducingStateDescriptor[Int]("wordcountReducingStateDescriptor",
        new ReduceFunction[Int](){
          override def reduce(v1: Int, v2: Int): Int = v1+v2
        },createTypeInformation[Int])

      val fsd=new FoldingStateDescriptor[Double,Double]("foldstate",0,new FoldFunction[Double,Double](){
        override def fold(accumulator: Double, value: Double): Double = {
          accumulator+value
        }
      },createTypeInformation[Double])

      //2.获取RuntimeContext
      var context: RuntimeContext = getRuntimeContext
      //3.获取指定类型状态
      rs=context.getReducingState(rsd)
      fs=context.getFoldingState(fsd)
    }

  override def map(value: (String, Double)): (String, Double) = {
    rs.add(1)
    fs.add(value._2)
    //返回最新值
    (value._1,fs.get()/rs.get())
  }
}
object FlinkUserOrderFoldState {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    //2.创建DataStream - 细化 001 zhangsan 1000
    val text = env.socketTextStream("CentOS", 9999)

    //3.执行DataStream的转换算子
    val counts = text.map(line=>line.split("\\s+"))
      .map(ts=>(ts(0)+":"+ts(1),ts(2).toDouble))
      .keyBy(0)
      .map(new UserOrderAvgMapFunction)

    //4.将计算的结果在控制打印
    counts.print()

    //5.执行流计算任务
    env.execute("Stream WordCount")
  }
}

State Time-To-Live (TTL)

Flink支持对上有的keyed state的状态指定TTL存活时间,配置状态的时效性,该特性默认是关闭。一旦开启该特性,Flink会尽最大努力删除过期状态。TTL支持单一值失效特性,同时也支持集合类型数据失效,例如MapState和ListState中的元素,每个元素都有自己的时效时间。

基本使用
//1.创建对应状态描述符
val xsd = new XxxxStateDescriptor[Int]("wordcount", createTypeInformation[Int])

//设置TTL实效性
val stateTtlConfig = StateTtlConfig.newBuilder(Time.seconds(5)) //设置存活时间5s ①
      .setUpdateType(UpdateType.OnCreateAndWrite) //创建、修改重新更新时间         ②
      .setStateVisibility(StateVisibility.NeverReturnExpired) //永不返回过期数据  ③
      .build() 
  
//启用TTL特性
xsd.enableTimeToLive(stateTtlConfig) 

①:该参数指定State存活时间,必须指定。

②:该参数指定State实效时间更新时机,默认值OnCreateAndWrite

  - OnCreateAndWrite: 只有修改操作,才会更新时间
  - OnReadAndWrite:只有访问读取、修改state时间就会更新

③:设置state的可见性,默认值NeverReturnExpired

  • NeverReturnExpired:永不返回过期状态
  • ReturnExpiredIfNotCleanedUp:如果flink没有删除过期的状态数据,系统会将过期的数据返回

注意:一旦用户开启了TTL特征,系统每个存储的状态数据会额外开辟8bytes(Long类型)的字节大小,用于存储state时间戳;系统的时效时间目前仅仅支持的是计算节点时间;如果程序一开始没有开启TTL,在服务重启以后,开启了TTL,此时服务在故障恢复的时候,会报错!

Cleanup Of Expired State

Flink默认仅仅当用户读状态的时候,才会去检查状态数据是否失效,如果失效将失效的数据立即删除。但就会导致系统在长时间运行的时候,会存在很多数据已经过期了,但是系统又没有去读取过期的状态数据,该数据一直驻留在内存中。

This means that by default if expired state is not read, it won’t be removed, possibly leading to ever growing state. This might change in future releases. 1.9.x之前说法

在flink-1.10版本中,系统可以根据State backend配置,定期在后台收集失效状态进行删除。用户可以通过调用以下API关闭自动清理。

val stateTtlConfig = StateTtlConfig.newBuilder(Time.seconds(5)) //设置存活时间5s
  .setUpdateType(UpdateType.OnCreateAndWrite) //创建、修改重新更新时间
  .setStateVisibility(StateVisibility.NeverReturnExpired) //永不返回过期数据
  .disableCleanupInBackground()
  .build()

早期版本需要用户手动调用cleanupInBackground开启后台清理。flink-1.10版本该特性自动打开。

Cleanup in full snapshot

可以通过配置Cleanup in full snapshot机制,在系统恢复的时候或者启动的时候, 系统会加载状态数据,此时会将过期的数据删除。也就意味着系统只用在重启或者恢复的时候才会加载状态快照信息。

val stateTtlConfig = StateTtlConfig.newBuilder(Time.seconds(5)) //设置存活时间5s
  .setUpdateType(UpdateType.OnCreateAndWrite) //创建、修改重新更新时间
  .setStateVisibility(StateVisibility.NeverReturnExpired) //永不返回过期数据
  .cleanupFullSnapshot()
  .build()

缺点:需要定期的关闭服务,进行服务重启,实现内存释放。

Incremental cleanup

Another option is to trigger cleanup of some state entries incrementally. The trigger can be a callback from each state access or/and each record processing. If this cleanup strategy is active for certain state, The storage backend keeps a lazy global iterator for this state over all its entries. Every time incremental cleanup is triggered, the iterator is advanced. The traversed state entries are checked and expired ones are cleaned up.

用户还可以使用增量清理策略。在用户每一次读取或者写入状态的数据的时候,该清理策略会运行一次。系统的state backend会持有所有状态的一个全局迭代器。每一次当用用户访问状态,该迭代器就会增量迭代一个批次数据,检查是否存在过期的数据,如果存在就删除。

//设置TTL实效性
val stateTtlConfig = StateTtlConfig.newBuilder(Time.seconds(5)) //设置存活时间5s
  .setUpdateType(UpdateType.OnCreateAndWrite) //创建、修改重新更新时间
  .setStateVisibility(StateVisibility.NeverReturnExpired) //永不返回过期数据
  .cleanupIncrementally(100,true)
  .build()
  • cleanupSize: 表示一次检查key的数目
  • runCleanupForEveryRecord:是否在有数据的数据就触发检查,如果为false,表示只有在状态访问或者修改的时候才会触发检查

The first one is number of checked state entries per each cleanup triggering. It is always triggered per each state access. The second parameter defines whether to trigger cleanup additionally per each record processing. The default background cleanup for heap backend checks 5 entries without cleanup per record processing.

Notes:

  • 如果没有状态访问或者记录处理,过期的数据依旧不会删除,会被持久化。
  • 增量检查state,会带来记录处理延迟。
  • 目前增量式的清理仅仅在支持Heap state backend,如果是RocksDB该配置不起作用。
Cleanup during RocksDB compaction

如果用户使用的是RocksDB作为状态后端实现,用户可以在RocksDB在做Compation的时候加入Filter,对过期的数据进行检查。删除过期数据。
在这里插入图片描述

RocksDB periodically runs asynchronous compactions to merge state updates and reduce storage. Flink compaction filter checks expiration timestamp of state entries with TTL and excludes expired values.

更详细RocksDB介绍:https://rocksdb.org.cn/doc.html

val stateTtlConfig = StateTtlConfig.newBuilder(Time.seconds(5)) //设置存活时间5s
     .setUpdateType(UpdateType.OnCreateAndWrite) //创建、修改重新更新时间
     .setStateVisibility(StateVisibility.NeverReturnExpired) //永不返回过期数据
     .cleanupInRocksdbCompactFilter(1000)
     .build()
  • queryTimeAfterNumEntries:RocksDB进行合并扫描多少条记录之后,执行一次查询,将过期数据删除。

更频繁地更新时间戳可以提高清除速度,但由于使用本地代码中的JNI调用,因此会降低压缩性能。每次处理1000个条目时,RocksDB后端的默认后台清理都会查询当前时间戳。

Updating the timestamp more often can improve cleanup speed but it decreases compaction performance because it uses JNI call from native code. The default background cleanup for RocksDB backend queries the current timestamp each time 1000 entries have been processed.

Note

在flink-1.10版本之前,RocksDB的Compact Filter特性是关闭的,需要额外的开启,用户只需在flink-conf.yaml中添加如下配置

state.backend.rocksdb.ttl.compaction.filter.enabled: true

This feature is disabled by default. It has to be firstly activated for the RocksDB backend by setting Flink configuration option state.backend.rocksdb.ttl.compaction.filter.enabled or by calling RocksDBStateBackend::enableTtlCompactionFilter if a custom RocksDB state backend is created for a job.

Checkpoint & Savepoint

由于Flink是一个有状态计算的流服务,因此状态的管理和容错是非常重要的。为了保证程序的健壮性,Flink提出Checkpoint机制,该机制用于持久化计算节点的状态数据,继而实现Flink故障恢复。所谓的Checkpoint机制指的是Flink会定期的持久化的状态数据。将状态数据持久化到远程文件系统(取决于State backend),例如HDFS,该检查点协调或者发起是由JobManager负责实施。JobManager会定期向下游的计算节点发送Barrier(栅栏),下游计算节点收到该Barrier信号之后,会预先提交自己的状态信息,并且给JobManage以应答,同时会继续将接收到的Barrier继续传递给下游的任务节点,一次内推,所有的下游计算节点在收到该Barrier信号的时候都会做预提交自己的状态信息。等到所有的下游节点都完成了状态的预提交,并且JobManager收集完成所有下游节点的应答之后,JobManager才会认定此次的Checkpoint是成功的,并且会自动删除上一次检查点数据。
在这里插入图片描述

Savepoint是手动触发的Checkpoint,Savepoint为程序创建快照并将其写到State Backend。Savepoint依靠常规的Checkpoint机制。所谓的Checkpoint指的是程序在执行期间,程序会定期在工作节点上快照并产生Checkpoint。为了进行恢复,仅需要获取最后一次完成的Checkpoint即可,并且可以在新的Checkpoint完成后立即安全地丢弃较旧的Checkpoint。

Savepoint与这些定期Checkpoint类似,Savepoint由用户触发并且更新的Checkpoint完成时不会自动过期。用户可以使用命令行或通过REST API取消作业时创建Savepoint

由于Flink 中的Checkpoint机制默认是不开启的,需要用户通过调用以下方法开启检查点机制。

env.enableCheckpointing(1000);

为了控制检查点执行的一些细节,Flink支持用户定制Checkpoiont的一些行为。

  //间隔5s执行一次checkpoint 精准一次
  env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
  //设置检查点超时 4s
  env.getCheckpointConfig.setCheckpointTimeout(4000)
  //开启本次检查点 与上一次完成的检查点时间间隔不得小于 2s 优先级高于 checkpoint interval
  env.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
  //如果检查点失败,任务宣告退出 setFailOnCheckpointingErrors(true)
  env.getCheckpointConfig.setTolerableCheckpointFailureNumber(0)
  //设置如果任务取消,系统该如何处理检查点数据
  //RETAIN_ON_CANCELLATION:如果取消任务的时候,没有加--savepoint,系统会保留检查点数据
  //DELETE_ON_CANCELLATION:取消任务,自动是删除检查点(不建议使用)
  env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)

State Backend

Flink指定多种State Backend实现,State Backend指定了状态数据(检查点数据)存储的位置信息。配置Flink的状态后端的方式有两种:

  • 每个计算独立状态后端
val env = StreamExecutionEnvironment.getExecutionEnvironment()
env.setStateBackend(...)
  • 全局默认状态后端,需要在flink-conf.yaml配置
#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================
# The backend that will be used to store operator state checkpoints if
# checkpointing is enabled.
#
# Supported backends are 'jobmanager', 'filesystem', 'rocksdb', or the
# <class-name-of-factory>.
#

 state.backend: rocksdb
 
# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#

 state.checkpoints.dir: hdfs:///flink-checkpoints
 
# Default target directory for savepoints, optional.
#
 state.savepoints.dir: hdfs:///flink-savepoints
 
# Flag to enable/disable incremental checkpoints for backends that
# support incremental checkpoints (like the RocksDB state backend).
#
 state.backend.incremental: false

Note

由于状态后端需要将数据同步到HDFS,因此Flink必须能够连接HDFS,所以需要在~/.bashrc配置HADOOP_CLASSPATH

JAVA_HOME=/usr/java/latest
HADOOP_HOME=/usr/hadoop-2.9.2
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASSPATH=.
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HADOOP_CLASSPATH=`hadoop classpath`
MemoryStateBackend(jobmanager)

MemoryStateBackend使用内存存储内部状态数据,将状态数据存储在在Java的heap中。在Checkpoint时候,此状态后端将对该状态进行快照,并将其作为检查点确认消息的一部分发送给JobManager(主服务器),该JobManager也将其存储在其堆中。

val env = StreamExecutionEnvironment.getExecutionEnvironment()
env.setStateBackend(new MemoryStateBackend(MAX_MEM_STATE_SIZE, true))

限制:

  • The size of each individual state is by default limited to 5 MB. This value can be increased in the constructor of the MemoryStateBackend.
  • Irrespective of the configured maximal state size, the state cannot be larger than the akka frame size (see Configuration).
  • The aggregate state must fit into the JobManager memory.

场景: 1)本地部署进行debug调试的可以使用 2)不仅涉及太多的状态管理。

FsStateBackend(filesystem)

该种状态后端实现是将数据的状态存储在TaskManager(计算节点)的内存。在执行检查点的时候后会将TaskManager内存的数据写入远程的文件系统。非常少量的元数据想信息会存储在JobManager的内存中。

val env = StreamExecutionEnvironment.getExecutionEnvironment()
env.setStateBackend(new FsStateBackend("hdfs:///flink-checkpoints",true))

场景:1)当用户有非常大的状态需要管理 2)所有生产环境

RocksDBStateBackend(rocksdb)

该种状态后端实现是将数据的状态存储在TaskManager(计算节点)的本地的RocksDB数据文件中。在执行检查点的时候后会将TaskManager本地的RocksDB数据库文件写入远程的文件系统。非常少量的元数据想信息会存储在JobManager的内存中。

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
  <version>1.10.0</version>
</dependency>
//1.创建流计算执行环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStateBackend(new RocksDBStateBackend("hdfs:///flink-rocksdb-checkpoints",true))

限制

  • As RocksDB’s JNI bridge API is based on byte[], the maximum supported size per key and per value is 2^31 bytes each. IMPORTANT: states that use merge operations in RocksDB (e.g. ListState) can silently accumulate value sizes > 2^31 bytes and will then fail on their next retrieval. This is currently a limitation of RocksDB JNI.

场景:1)当用户有超大的状态需要管理 2)所有生产环境

Note that the amount of state that you can keep is only limited by the amount of disk space available. This allows keeping very large state, compared to the FsStateBackend that keeps state in memory. This also means, however, that the maximum throughput that can be achieved will be lower with this state backend. All reads/writes from/to this backend have to go through de/serialization to retrieve/store the state objects, which is also more expensive than always working with the on-heap representation as the heap-based backends are doing.

Managed Operator State

Flink提供了基于keyed stream操作符状态称为keyedstate,对于一些非keyed stream的操作中使用的状态统称为Operator State,如果用户希望使用Operator State需要实现通用的CheckpointedFunction接口或者ListCheckpointed

CheckpointedFunction

其中CheckpointedFunction接口提供non-keyed state的不同状态分发策略。用户在实现该接口的时候需要实现以下两个方法:

public interface CheckpointedFunction {
	void snapshotState(FunctionSnapshotContext context) throws Exception;
	void initializeState(FunctionInitializationContext context) throws Exception;
}
  • snapshotState:当系统进行Checkpoint的时候,系统回调用该方法,通常用户需要将持久化的状态数据存储到状态中。
  • initializeState:当第一次启动的时候系统自动调用initializeState,进行状态初始化。或者系统在故障恢复的时候进行状态的恢复。

Whenever a checkpoint has to be performed, snapshotState() is called. The counterpart, initializeState(), is called every time the user-defined function is initialized, be that when the function is first initialized or be that when the function is actually recovering from an earlier checkpoint. Given this, initializeState() is not only the place where different types of state are initialized, but also where state recovery logic is included.

当前,Operator State支持list-style的Managed State。该状态应为彼此独立的可序列化对象的列表,因此在系统故障恢复的时候才有可能进行重新分配。目前Flink针对于Operator State分配方案有以下两种:

  • Even-split redistribution - 每一个操作符实例都会保留一个List的状态,因此Operator State逻辑上是将该Operator的并行实例的所有的List状态拼接成一个完成的List State。当系统在恢复、重新分发状态的时候,系统会根据当前Operator实例并行度,对当前的状态进行均分。例如,如果在并行度为1的情况下,Operator的检查点状态包含元素element1和element2,则在将Operator并行度提高到2时,element1可能会分配给Operator Instance 0,而element2将进入Operator Instance 1.

  • Union redistribution: - 每一个操作符实例都会保留一个List的状态,因此Operator State逻辑上是将该Operator的并行实例的所有的List状态拼接成一个完成的List State。在还原/重新分发状态时,每个Operator实例都会获得状态元素的完整列表。

class UserDefineBufferSinkEvenSplit(threshold: Int = 0) extends SinkFunction[(String, Int)] with CheckpointedFunction{

  @transient
  private var checkpointedState: ListState[(String, Int)] = _

  private val bufferedElements = ListBuffer[(String, Int)]()

  //复写写出逻辑
  override def invoke(value: (String, Int), context: SinkFunction.Context[_]): Unit = {
     bufferedElements += value
     if(bufferedElements.size >= threshold){
       for(e <- bufferedElements){
         println("元素:"+e)
       }
       bufferedElements.clear()
     }
  }

  //需要将状态数据存储起来
  override def snapshotState(context: FunctionSnapshotContext): Unit = {
      checkpointedState.clear()
      checkpointedState.update(bufferedElements.asJava)//直接将状态数据存储起来
  }
  //初始化状态逻辑、状态恢复逻辑
  override def initializeState(context: FunctionInitializationContext): Unit = {
    //初始化状态、也有可能是故障恢复
    val lsd=new ListStateDescriptor[(String, Int)]("list-state",createTypeInformation[(String,Int)])
    checkpointedState = context.getOperatorStateStore.getListState(lsd) //默认均分方式恢复
                       //context.getOperatorStateStore.getUnionListState(lsd) //默认广播方式恢复
    if(context.isRestored){ //实现故障恢复逻辑
      bufferedElements.appendAll(checkpointedState.get().asScala.toList)
    }
  }
}
object FlinkWordCountValueStateCheckpoint {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStateBackend(new RocksDBStateBackend("hdfs:///flink-rocksdb-checkpoints",true))

    //间隔5s执行一次checkpoint 精准一次
    env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
    //设置检查点超时 4s
    env.getCheckpointConfig.setCheckpointTimeout(4000)
    //开启本次检查点 与上一次完成的检查点时间间隔不得小于 2s 优先级高于 checkpoint interval
    env.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
    //如果检查点失败,任务宣告退出 setFailOnCheckpointingErrors(true)
    env.getCheckpointConfig.setTolerableCheckpointFailureNumber(0)
    //设置如果任务取消,系统该如何处理检查点数据
    //RETAIN_ON_CANCELLATION:如果取消任务的时候,没有加--savepoint,系统会保留检查点数据
    //DELETE_ON_CANCELLATION:取消任务,自动是删除检查点(不建议使用)
    env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)

    //2.创建DataStream - 细化
    val text = env.socketTextStream("CentOS", 9999)

    //3.执行DataStream的转换算子
    val counts = text.flatMap(line=>line.split("\\s+"))
      .map(word=>(word,1))
      .keyBy(0)
      .map(new WordCountMapFunction)
      .uid("wc-map")

     //4.将计算的结果在控制打印
     counts.addSink(new UserDefineBufferSinkEvenSplit(3))
           .uid("buffer-sink")

    //5.执行流计算任务
    env.execute("Stream WordCount")
  }
}
class WordCountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
  var vs:ValueState[Int]=_

  override def open(parameters: Configuration): Unit = {
    //1.创建对应状态描述符
    val vsd = new ValueStateDescriptor[Int]("wordcount", createTypeInformation[Int])
    //2.获取RuntimeContext
    var context: RuntimeContext = getRuntimeContext
    //3.获取指定类型状态
    vs=context.getState(vsd)
  }

  override def map(value: (String, Int)): (String, Int) = {
    //获取历史值
    val historyData = vs.value()
    //更新状态
    vs.update(historyData+value._2)
    //返回最新值
    (value._1,vs.value())
  }
}
ListCheckpointed

ListCheckpointed接口是CheckpointedFunction的更有限的变体写法。因为该接口仅仅支持list-style state 的Even Split分发策略。

public interface ListCheckpointed<T extends Serializable> {
	List<T> snapshotState(long checkpointId, long timestamp) throws Exception;
	void restoreState(List<T> state) throws Exception;
}
  • snapshotState:在做系统检查点的时候,用户只需要将需要存储的数据返回即可。
  • restoreState:直接提供给用户需要恢复状态。

On snapshotState() the operator should return a list of objects to checkpoint and restoreState has to handle such a list upon recovery. If the state is not re-partitionable, you can always return a Collections.singletonList(MY_STATE) in the snapshotState().

object FlinkCounterSource {
  def main(args: Array[String]): Unit = {
    //1.创建流计算执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
      env.setStateBackend(new RocksDBStateBackend("hdfs:///flink-rocksdb-checkpoints",true))

      //间隔5s执行一次checkpoint 精准一次
      env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
      //设置检查点超时 4s
      env.getCheckpointConfig.setCheckpointTimeout(4000)
      //开启本次检查点 与上一次完成的检查点时间间隔不得小于 2s 优先级高于 checkpoint interval
      env.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
      //如果检查点失败,任务宣告退出 setFailOnCheckpointingErrors(true)
      env.getCheckpointConfig.setTolerableCheckpointFailureNumber(0)
      //设置如果任务取消,系统该如何处理检查点数据
      //RETAIN_ON_CANCELLATION:如果取消任务的时候,没有加--savepoint,系统会保留检查点数据
      //DELETE_ON_CANCELLATION:取消任务,自动是删除检查点(不建议使用)
      env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)

      val text = env.addSource(new UserDefineCounterSource)
      .uid("UserDefineCounterSource")

      text.print("offset")

      //5.执行流计算任务
      env.execute("Stream WordCount")
  }
}
class UserDefineCounterSource  extends RichParallelSourceFunction[Long] with ListCheckpointed[JLong]{
  @volatile
  private var isRunning = true
  private var offset = 0L

  //存储状态值
  override def snapshotState(checkpointId: Long, timestamp: Long): util.List[JLong] = {
    println("snapshotState:"+offset)
    Collections.singletonList(offset)//返回一个不可拆分集合
  }

  override def restoreState(state: util.List[JLong]): Unit = {
     println("restoreState:"+state.asScala)
     offset=state.asScala.head //取第一个元素
  }

  override def run(ctx: SourceFunction.SourceContext[Long]): Unit = {
    val lock = ctx.getCheckpointLock
    while (isRunning) {
      Thread.sleep(1000)
      lock.synchronized({
        ctx.collect(offset) //往下游输出当前offset
        offset += 1
      })
    }
  }

  override def cancel(): Unit = isRunning=false
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值