Flink 状态计算
一、概述
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/state.html
Flink是⼀个基于状态计算的流计算服务。
Flink将所有的状态分为两⼤类: keyed state
与 operatorstate
.所谓的keyed state指的是Flink底层会给每⼀个Key绑定若⼲个类型的状态值,特指操作KeyedStream中所涉及的状态。所谓operator state指的是⾮keyed stream中所涉及状态称为operatorstate,所有的operator state会将状态和具体某个操作符进⾏绑定。⽆论是 keyed state 还是 operator state flink将这些状态管理底层分为两种存储形式:Managed State
和Raw State
。
- Managed State- 所谓的Managed State,指的是
由Flink控制状态存储结构
,例如:状态数据结构、数据类型等,由于是Flink⾃⼰管理状态,因此Flink可以更好的针对于管理状态做内存的优化和故障恢复。 - Raw State - 所谓的Raw state,指的是
Flink对状态的信息和结构⼀⽆所知
,Flink仅仅知道该状态是⼀些⼆进制字节数组,需要⽤户⾃⼰完成状态序列化和反序列化。,因此Raw State Flink不能够针对性的做内存优化,也不⽀持故障状态的恢复。因此在Flink实战项⽬开发中,⼏乎不使⽤Raw State.
二、Managed Keyed State
类型 | 使用场景 | 方法 |
---|---|---|
ValueState | 该状态主要⽤于存储单⼀状态值。 | T value() update(T) clear() |
ListState | 该状态主要⽤于存储单集合状态值。 | add(T) addAll(List) update(List) Iterableget() clear() |
MapState<uk,uv> | 该状态主要⽤于存储⼀个Map集合 | put(UK, UV) putAll(Map) get(UK) entries() keys() values() clear() |
ReducingState | 该状态主要⽤于存储单⼀状态值。该状态会将添加的元素和历史状态⾃动做运算,调⽤⽤户提供的ReduceFunction | add(T) T get() clear() |
AggregatingState<in,out> | 该状态主要⽤于存储单⼀状态值。该状态会将添加的元素和历史状态⾃动做运算,调⽤⽤户提供的AggregateFunction,该状态和ReducingState不同点在于数据输⼊和输出类型可以不⼀致 | add(IN) OUT get() clear() |
FoldingState<T,ACC> | 该状态主要⽤于存储单⼀状态值。该状态会将添加的元素和历史状态⾃动做运算,调⽤⽤户提供的FoldFunction,该状态和ReducingState不同点在于数据输⼊和中间结果类型可以不⼀致 | add(T) T get() clear() |
①.ValueState
package com.baizhi.state
import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.api.common.state.{
ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._
object ValueState {
def main(args: Array[String]): Unit = {
//1.获取流执行环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
//2.设置并行度
env.setParallelism(4)
//2.获取输入源
val in = env.socketTextStream("hbase",9999)
//3.处理流
in.flatMap(_.split("\\s+"))
.map(x=>(x,1))
.keyBy(0)
.map(new ValueMapFunction)
.print()
//4.执行
env.execute("word count")
}
}
class ValueMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
//创建一个状态对象
var vs:ValueState[Int] = _
override def open(parameters: Configuration): Unit = {
//创建一个状态描述符
val valueStateDescriptor = new ValueStateDescriptor[Int]("valueWordCount",createTypeInformation[Int])
//获取运行时全文
val context = getRuntimeContext
//从全文中获取状态
vs = context.getState(valueStateDescriptor)
}
override def map(in: (String, Int)): (String, Int) = {
//获取最新状态
vs.update(in._2+vs.value())
(in._1,vs.value())
}
}
②.ListState
package com.baizhi.state
import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.api.common.state.{
ListState, ListStateDescriptor, ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._
import scala.collection.JavaConverters._
object ListState {
def main(args: Array[String]): Unit = {
//1.获取流执行环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
//2.设置并行度
env.setParallelism(4)
//2.获取输入源
val in = env.socketTextStream("hbase",9999)
//3.处理流 01 zs apple
in.map(_.split("\\s+"))
.map(t=>(t(0)+":"+t(1),t(2)))
.keyBy(0)
.map(new ListMapFunction)
.print()
//4.执行
env.execute("word count")
}
}
class ListMapFunction extends RichMapFunction[(String,String),(String,String)]{
//创建一个状态对象
var vs:ListState[String] = _
override def open(parameters: Configuration): Unit = {
//创建一个状态描述器
val listDescriptor = new ListStateDescriptor[String]("ListWoedCount",createTypeInformation[String])
//获取运行时上下文
val context = getRuntimeContext
//获取状态变量
vs = context.getListState(listDescriptor)
}
override def map(in: (String, String)): (String, String) = {
//获取原始状态
val list = vs.get().asScala.toList
//改变状态
val distinct = list.::(in._2).distinct
val javaList = distinct.asJava
vs.update(javaList)
//返回
(in._1,distinct.mkString("|"))
}
}
③.MapState
package com.baizhi.state
import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.api.common.state.{
ListState, ListStateDescriptor, MapState, MapStateDescriptor, ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._
import scala.collection.JavaConverters._
object MapState {
def main(args: Array[String]): Unit = {
//1.获取流执行环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
//2.设置并行度
env.setParallelism(4)
//2.获取输入源
val in = env.socketTextStream("hbase",9999)
//3.处理流 01 zs apple
in.map(_.split("\\s+"))
.map(t=>(t(0)+":"+t(1),t(2)))
.keyBy(0)
.map(new MapMapFunction)
.print()
//4.执行
env.execute("word count")
}
}
class MapMapFunction extends RichMapFunction[(String,String),(String,String)]{
//创建一个状态对象
var state:MapState[String,Int]=_
override def open(parameters: Configuration): Unit = {
//创建一个状态描述器
val mapSteatDescriptor = new MapStateDescriptor[String,Int]("MapWordCount",createTypeInformation[(String)],createTypeInformation[Int])
//获取运行时全文
val context = getRuntimeContext
//从全文获取状态对象
state = context.getMapState(mapSteatDescriptor)
}
override def map(in: (String, String)): (String, String) = {
var count = 0 //设定value为0
//判断状态变量中是否存有现在的key
if(state.contains(in._2)){
//如果包含,获取原始值
count = state.get(in._2)
}
//状态更新
state.put(in._2,count+1)
//获取当前值(这是一个key-value的集合)
val list = state.entries().asScala.map(entry=>(entry.getKey,entry.getValue)).toList
//返回
(in._1,list.mkString("|"))
}
}
④.ReducingState
package com.baizhi.state
import org.apache.flink.api.common.functions.{
ReduceFunction, RichMapFunction}
import org.apache.flink.api.common.state.StateTtlConfig.{
StateVisibility, UpdateType}
import org.apache.flink.api.common.state.{
ListState, ListStateDescriptor, MapState, MapStateDescriptor, ReducingState, ReducingStateDescriptor, StateTtlConfig, ValueState, ValueStateDescriptor}
import org.apache.flink.api.common.time.Time
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._
import scala.collection.JavaConverters._
object ReduceState {
def main(args: Array[String]): Unit = {
//1.获取流执行环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
//2.设置并行度
env.setParallelism(4)
//2.获取输入源
val in = env.socketTextStream("hbase",9999)
//3.处理流 01 zs apple
in.flatMap(_.split("\\s+"))
.map(t=>(t,1))
.keyBy(0)
.map(new ReduceMapFunction)
.print()
//4.执行
env.execute("word count")
}
}
class ReduceMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
//创建一个状态对象
var state:ReducingState[Int]=_
override def open(parameters: Configuration): Unit = {
//创建一个状态描述器
val reducingStateDescriptor = new ReducingStateDescriptor[Int]("reducingWordCount", new ReduceFunction[Int] {
override def reduce(t: Int, t1: Int): Int = {
t + t1
}
}, createTypeInformation[Int])
//创建一个状态的过期配置
val stateTtlConfig = StateTtlConfig.newBuilder(Time.seconds(5)) //设置过期时间 ,配置5秒的过期时间
.setUpdateType(UpdateType.OnCreateAndWrite) //设置更新类型 创建和修改重新更新时间
.setStateVisibility(StateVisibility.NeverReturnExpired) //设置过期数据的可视化策略,从不返回过期数据
.build()
//开启失效配置
reducingStateDescriptor.enableTimeToLive(stateTtlConfig)
//获取全文对象
val context = getRuntimeContext
//从全文中获取状态
state = context.getReducingState(reducingStateDescriptor)
}
override def map(in: (String, Int)): (String, Int) = {
//更新状态
state.add(in._2)
//返回
(in._1,state.get())
}
}
⑤.AggregatingState
package com.baizhi.state
import org.apache.flink.api.common.functions.{
AggregateFunction, ReduceFunction, RichMapFunction}
import org.apache.flink.api.common.state.{
AggregatingState, AggregatingStateDescriptor, ListState, ListStateDescriptor, MapState, MapStateDescriptor, ReducingState, ReducingStateDescriptor, ValueState, ValueStateDescriptor}
import org.