Flink 状态计算

本文详细介绍了Flink的状态计算,包括Managed Keyed State的各种类型、State Time-To-Live (TTL)、清理策略以及检查点机制。此外,还讨论了不同状态后端如MemoryStateBackend、FsStateBackend和RocksDBStateBackend的特性和使用场景,以及Managed Operator State、广播状态和可查询状态的实现与配置。
摘要由CSDN通过智能技术生成

Flink 状态计算

一、概述

https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/state.html

Flink是⼀个基于状态计算的流计算服务。Flink将所有的状态分为两⼤类: keyed stateoperatorstate.所谓的keyed state指的是Flink底层会给每⼀个Key绑定若⼲个类型的状态值,特指操作KeyedStream中所涉及的状态。所谓operator state指的是⾮keyed stream中所涉及状态称为operatorstate,所有的operator state会将状态和具体某个操作符进⾏绑定。⽆论是 keyed state 还是 operator state flink将这些状态管理底层分为两种存储形式:Managed StateRaw State

  • Managed State- 所谓的Managed State,指的是由Flink控制状态存储结构,例如:状态数据结构、数据类型等,由于是Flink⾃⼰管理状态,因此Flink可以更好的针对于管理状态做内存的优化和故障恢复。
  • Raw State - 所谓的Raw state,指的是Flink对状态的信息和结构⼀⽆所知,Flink仅仅知道该状态是⼀些⼆进制字节数组,需要⽤户⾃⼰完成状态序列化和反序列化。,因此Raw State Flink不能够针对性的做内存优化,也不⽀持故障状态的恢复。因此在Flink实战项⽬开发中,⼏乎不使⽤Raw State.

二、Managed Keyed State

类型 使用场景 方法
ValueState 该状态主要⽤于存储单⼀状态值。 T value()
update(T)
clear()
ListState 该状态主要⽤于存储单集合状态值。 add(T)
addAll(List)
update(List)
Iterableget()
clear()
MapState<uk,uv> 该状态主要⽤于存储⼀个Map集合 put(UK, UV)
putAll(Map)
get(UK)
entries()
keys()
values()
clear()
ReducingState 该状态主要⽤于存储单⼀状态值。该状态会将添加的元素和历史状态⾃动做运算,调⽤⽤户提供的ReduceFunction add(T)
T get()
clear()
AggregatingState<in,out> 该状态主要⽤于存储单⼀状态值。该状态会将添加的元素和历史状态⾃动做运算,调⽤⽤户提供的AggregateFunction,该状态和ReducingState不同点在于数据输⼊和输出类型可以不⼀致 add(IN)
OUT get()
clear()
FoldingState<T,ACC> 该状态主要⽤于存储单⼀状态值。该状态会将添加的元素和历史状态⾃动做运算,调⽤⽤户提供的FoldFunction,该状态和ReducingState不同点在于数据输⼊和中间结果类型可以不⼀致 add(T)
T get()
clear()
①.ValueState
package com.baizhi.state

import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.api.common.state.{
   ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._

object ValueState {
   
  def main(args: Array[String]): Unit = {
   
    //1.获取流执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //2.设置并行度
    env.setParallelism(4)
    //2.获取输入源
    val in = env.socketTextStream("hbase",9999)
    //3.处理流
    in.flatMap(_.split("\\s+"))
      .map(x=>(x,1))
      .keyBy(0)
      .map(new ValueMapFunction)
      .print()

    //4.执行
    env.execute("word count")
  }
}

class ValueMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
   
//创建一个状态对象
  var vs:ValueState[Int] = _

  override def open(parameters: Configuration): Unit = {
   
    //创建一个状态描述符
    val valueStateDescriptor = new ValueStateDescriptor[Int]("valueWordCount",createTypeInformation[Int])
    //获取运行时全文
    val context = getRuntimeContext
    //从全文中获取状态
   vs =  context.getState(valueStateDescriptor)

  }

  override def map(in: (String, Int)): (String, Int) = {
   
      //获取最新状态
   vs.update(in._2+vs.value())
    (in._1,vs.value())
  }
}
②.ListState
package com.baizhi.state

import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.api.common.state.{
   ListState, ListStateDescriptor, ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._
import scala.collection.JavaConverters._

object ListState {
   
  def main(args: Array[String]): Unit = {
   
    //1.获取流执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //2.设置并行度
    env.setParallelism(4)
    //2.获取输入源
    val in = env.socketTextStream("hbase",9999)
    //3.处理流  01 zs apple
    in.map(_.split("\\s+"))
      .map(t=>(t(0)+":"+t(1),t(2)))
      .keyBy(0)
      .map(new ListMapFunction)
      .print()

    //4.执行
    env.execute("word count")
  }
}

class ListMapFunction extends RichMapFunction[(String,String),(String,String)]{
   
//创建一个状态对象
  var vs:ListState[String] = _

  override def open(parameters: Configuration): Unit = {
   
    //创建一个状态描述器
    val listDescriptor = new ListStateDescriptor[String]("ListWoedCount",createTypeInformation[String])
    //获取运行时上下文
    val context = getRuntimeContext
    //获取状态变量
   vs =  context.getListState(listDescriptor)
  }

  override def map(in: (String, String)): (String, String) = {
   
    //获取原始状态
    val list = vs.get().asScala.toList
    //改变状态
    val distinct = list.::(in._2).distinct
    val javaList = distinct.asJava
    vs.update(javaList)
    //返回
    (in._1,distinct.mkString("|"))
  }
}
③.MapState
package com.baizhi.state

import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.api.common.state.{
   ListState, ListStateDescriptor, MapState, MapStateDescriptor, ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._

import scala.collection.JavaConverters._

object MapState {
   
  def main(args: Array[String]): Unit = {
   
    //1.获取流执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //2.设置并行度
    env.setParallelism(4)
    //2.获取输入源
    val in = env.socketTextStream("hbase",9999)
    //3.处理流  01 zs apple
    in.map(_.split("\\s+"))
      .map(t=>(t(0)+":"+t(1),t(2)))
      .keyBy(0)
      .map(new MapMapFunction)
      .print()

    //4.执行
    env.execute("word count")
  }
}

class MapMapFunction extends RichMapFunction[(String,String),(String,String)]{
   
//创建一个状态对象
  var state:MapState[String,Int]=_
  override def open(parameters: Configuration): Unit = {
   
    //创建一个状态描述器
    val mapSteatDescriptor = new MapStateDescriptor[String,Int]("MapWordCount",createTypeInformation[(String)],createTypeInformation[Int])
    //获取运行时全文
    val context = getRuntimeContext
    //从全文获取状态对象
    state = context.getMapState(mapSteatDescriptor)
  }

  override def map(in: (String, String)): (String, String) = {
   

    var count = 0 //设定value为0
      //判断状态变量中是否存有现在的key
    if(state.contains(in._2)){
   
      //如果包含,获取原始值
      count = state.get(in._2)
    }
    //状态更新
    state.put(in._2,count+1)
    //获取当前值(这是一个key-value的集合)
    val list = state.entries().asScala.map(entry=>(entry.getKey,entry.getValue)).toList
    //返回
    (in._1,list.mkString("|"))
  }
}
④.ReducingState
package com.baizhi.state

import org.apache.flink.api.common.functions.{
   ReduceFunction, RichMapFunction}
import org.apache.flink.api.common.state.StateTtlConfig.{
   StateVisibility, UpdateType}
import org.apache.flink.api.common.state.{
   ListState, ListStateDescriptor, MapState, MapStateDescriptor, ReducingState, ReducingStateDescriptor, StateTtlConfig, ValueState, ValueStateDescriptor}
import org.apache.flink.api.common.time.Time
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala._

import scala.collection.JavaConverters._

object ReduceState {
   
  def main(args: Array[String]): Unit = {
   
    //1.获取流执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //2.设置并行度
    env.setParallelism(4)
    //2.获取输入源
    val in = env.socketTextStream("hbase",9999)
    //3.处理流  01 zs apple
    in.flatMap(_.split("\\s+"))
      .map(t=>(t,1))
      .keyBy(0)
      .map(new ReduceMapFunction)
      .print()

    //4.执行
    env.execute("word count")
  }
}

class ReduceMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
   
//创建一个状态对象
  var state:ReducingState[Int]=_


  override def open(parameters: Configuration): Unit = {
   
    //创建一个状态描述器
    val reducingStateDescriptor = new ReducingStateDescriptor[Int]("reducingWordCount", new ReduceFunction[Int] {
   
      override def reduce(t: Int, t1: Int): Int = {
   
        t + t1
      }
    }, createTypeInformation[Int])
    //创建一个状态的过期配置

    val stateTtlConfig = StateTtlConfig.newBuilder(Time.seconds(5)) //设置过期时间 ,配置5秒的过期时间
      .setUpdateType(UpdateType.OnCreateAndWrite) //设置更新类型 创建和修改重新更新时间
      .setStateVisibility(StateVisibility.NeverReturnExpired) //设置过期数据的可视化策略,从不返回过期数据
      .build()
    //开启失效配置
    reducingStateDescriptor.enableTimeToLive(stateTtlConfig)

    //获取全文对象
    val context = getRuntimeContext
    //从全文中获取状态
    state = context.getReducingState(reducingStateDescriptor)
  }

  override def map(in: (String, Int)): (String, Int) = {
   

    //更新状态
    state.add(in._2)
    //返回
    (in._1,state.get())
  }
}
⑤.AggregatingState
package com.baizhi.state

import org.apache.flink.api.common.functions.{
   AggregateFunction, ReduceFunction, RichMapFunction}
import org.apache.flink.api.common.state.{
   AggregatingState, AggregatingStateDescriptor, ListState, ListStateDescriptor, MapState, MapStateDescriptor, ReducingState, ReducingStateDescriptor, ValueState, ValueStateDescriptor}
import org.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值