flink的状态和示例
什么是状态?
大多数流应用程序都是有状态的。许多算子会不断地读取和更新状态
例如worldcount程序,也要读取当前的状态去计算再更新状态。可以简单理解为保存上一次计算的结果,用以下一次计算
状态种类
Operator State (算子状态)
- BroadcastState:用于广播的算子状态。
- ListState:将状态表示为一组数据的列表。
- UnionListState:存储列表类型的状态,与 ListState 的区别在于:如果并行度发生变化,ListState 会将该算子的所有并发的状态实例进行汇总,然后均分给新的 Task;而 UnionListState 只是将所有并发的状态实例汇总起来,具体的划分行为则由用户进行定义。
Keyed State(建控状态)
根据输入数据流中定义的键(key)来维护和访问的
- ValueState[T]保存单个的值,值的类型为T。
get操作: ValueState.value()
set操作: ValueState.update(value: T) - ListState[T]保存一个列表,列表里的元素的数据类型为T。基本操作如下:
ListState.add(value: T)
ListState.addAll(values: java.util.List[T])
ListState.get()返回Iterable[T]
ListState.update(values: java.util.List[T]) - MapState[K, V]保存Key-Value对
MapState.get(key: K)
MapState.put(key: K, value: V)
MapState.contains(key: K)
MapState.remove(key: K)
-ReducingState[T] 用于存储经过 ReduceFunction 计算后的结果,接口和ListState相同,使用 add(T) 增加元素,但是使用add(T)方法本质是使用指定ReduceFunction的聚合行为。 - AggregatingState[I, O] 它保存了一个聚合了所有添加到这个状态的值的结果。与ReducingState有些不同,聚合类型可能不同于添加到状态的元素的类型。接口和ListState相同,但是使用add(IN)添加的元素本质是通过使用指定的AggregateFunction进行聚合。
使用时,必须创建一个StateDescriptor
// 定义状态保存上一次的温度值
lazy val last_Temp: ValueState[Double] = getRuntimeContext.getState(new ValueStateDescriptor[Double]("last_temp", classOf[Double]))
例子:两次温差大于10则显示
package FlinkProject.runMain.status
import FlinkProject.utils.sensor
import org.apache.flink.api.common.functions.RichFlatMapFunction
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
import org.apache.flink.util.Collector
import java.util.Properties
object status01 {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("group.id", "flink_group")
val stream: DataStream[String] = env.addSource(new FlinkKafkaConsumer[String]("flink", new SimpleStringSchema(), properties))
val result: DataStream[(String, Double, Double)] = stream.map((x: String) => {
val data: Array[String] = x.split(",")
sensor(data(0), data(1).toLong, data(2).toDouble)
}).keyBy((_: sensor).id)
.flatMap(new myRichFlatMapFunction(10D))
result.print()
env.execute()
}
}
class myRichFlatMapFunction(threshold: Double) extends RichFlatMapFunction[sensor, (String, Double, Double)] {
// 定义状态保存上一次的温度值
lazy val last_Temp: ValueState[Double] = getRuntimeContext.getState(new ValueStateDescriptor[Double]("last_temp", classOf[Double]))
override def flatMap(value: sensor, out: Collector[(String, Double, Double)]): Unit = {
// 上一次的状态
val last_temp: Double = last_Temp.value()
val diff: Double = (last_temp - value.temperature).abs
if (diff > threshold) {
out.collect(value.id, value.temperature, last_temp)
}
last_Temp.update(value.temperature)
}
}
还有一个flatMapWithState
import FlinkProject.utils.sensor
import org.apache.flink.api.common.functions.RichFlatMapFunction
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
import org.apache.flink.util.Collector
import java.util.Properties
object status01 {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("group.id", "flink_group")
val stream: DataStream[String] = env.addSource(new FlinkKafkaConsumer[String]("flink", new SimpleStringSchema(), properties))
val result: DataStream[(String, Double, Double)] = stream.map((x: String) => {
val data: Array[String] = x.split(",")
sensor(data(0), data(1).toLong, data(2).toDouble)
}).keyBy((_: sensor).id)
.flatMapWithState[(String, Double, Double), Double] {
case (date: sensor, None) => (List.empty, Some(date.temperature))
case (data: sensor, last_temp: Some[Double]) => {
val diff: Double = (data.temperature - last_temp.get).abs
if (diff >= 10) {
(List((data.id, data.temperature, last_temp.get)), Some(data.temperature))
}
else {
(List.empty, Some(data.temperature))
}
}
}
result.print()
env.execute()
}
}