Flink 系列之二十 - 高级概念 - 状态管理

linmoo1986

于 2025-05-15 08:32:08 发布

阅读量1k

点赞数 33

分类专栏： flink 文章标签： flink 大数据状态管理

本文链接：https://blog.csdn.net/linwu_2006_2006/article/details/147170575

版权

flink 专栏收录该内容

21 篇文章

订阅专栏

之前做过数据平台，对于实时数据采集，使用了Flink。现在想想，在数据开发平台中，Flink的身影几乎无处不在，由于之前是边用边学，总体有点混乱，借此空隙，整理一下Flink的内容，算是一个知识积累，同时也分享给大家。

注意：由于框架不同版本改造会有些使用的不同，因此本次系列中使用基本框架是 Flink-1.19.x，Flink支持多种语言，这里的所有代码都是使用java，JDK版本使用的是19。
代码参考：https://github.com/forever1986/flink-study.git

在《系列之一 - 开篇》中说过，Flink是一个支持有状态的流处理。这一章就来讲一下Flink的状态管理

1 状态的分类

当用户在计算过程中可能需要存储一些中间状态时，比如在《系列之十 - Data Stream API的中间算子：合流和分流》中的1.2.3 模拟关联表的示例中使用过自己定义中间map存储流的数据，当时就说这种存储其实存在没有落盘以及没有清理的风险。后来在讲到窗口时补充了join和intervalJoin使用Flink自带的把用户保存中间状态数据。join和intervalJoin是内部Flink帮用户保存状态，如果用户想自定义状态，那么Flink也提供了对外存储数据的接口。

Flink的状态有两种： 托管状态(Managed State) 和 原始状态(Raw State) 。

托管状态(Managed State) ：就是由Flink统一管理的，状态的存储访问、故障恢复和重组等一系列问题都由Flink实现，用户只要调接口就可以；
原始状态(Raw State)：则是自定义的，相当于就是开辟了一块内存，需要用户自己管理，实现状态的序列化和故障恢复（在演示模拟关联表的示例中就是一个原始状态，只不过那时候还没有处理序列化等问题）。

其中 托管状态(Managed State) 又分为 算子状态(Operator State) 和 按键分区状态(keyed State) ：

算子状态(Operator State) ：适合所有算子，算子状态在整个子任务中共享。
按键分区状态(keyed State)：需要经过keyBy之后使用，并且在子任务中会按照key进行分开存储，不同key之间不会共享。

在这里插入图片描述

Flink中提供的 托管状态(Managed State) 基本上能涵盖用户的99.9%需求，因此这里就不对 原始状态(Raw State) 做介绍，下面将从 托管状态(Managed State) 的两个分类 算子状态(Operator State) 和 按键分区状态(keyed State) 分别做讲解。

2 按键分区状态(keyed State)

2.1 常用的按键分区状态

前面提到 按键分区状态(keyed State) 本身需要经过keyBy之后使用，但无需用户自己实现落盘操作，由Flink自动管理。那么有哪些常用的按键分区状态，如下图：

在这里插入图片描述

ValueState：存储一个普通变量值
ListState：存储一个List的变量值
MapState：存储一个Map的变量值
ReducingState：存储一个规约后的数据，也就是会存储一个数据，但是自动规约
AggregatingState：存储一个聚合后的数据，也就是会存储一个数据，但是自动聚合

2.2 使用按键分区状态的步骤

不同的流和不同的函数，其使用按键分区状态的步骤都差不多，只不过初始化和获取的Context定义有点差异，下面列举常用的两种类型，示例代码中也会分别演示这两种。

2.2.1 继承RichFunction函数使用按键分区状态的步骤

如果某个函数是继承RichFunction函数的，RichFunction本身已经封装了一遍，所以其使用步骤如下：

1）需要定义一个 按键分区状态(keyed State) 变量
2）在RichFunction中的open方法做初始化
3）使用不同 按键分区状态(keyed State) 进行操作

在这里插入图片描述

其中关于 按键分区状态(keyed State) 的初始化，需要使用一个StateDescriptor类进行初始化，每个 按键分区状态(keyed State) 都实现了一个StateDescriptor类。StateDescriptor类的初始化一般由2个参数：

第一个参数是自定义一个变量名称（变量名称不要重复即可）
第二个参数则是数据类型（参考《系列之十四 - Data Stream API的自定义数据类型》）

在这里插入图片描述

2.2.2 非RichFunction函数使用按键分区状态的步骤

前面之所以有open方法是因为实现了RichFunction接口封装的，但是对于那些没有继承RichFunction的函数，比如MapFunction等，该如何使用 按键分区状态(keyed State) 呢？步骤如下：

1）在定义Function时，实现CheckpointedFunction接口
2）需要定义一个 按键分区状态(keyed State) 变量
3）在initializeState方法中初始化按键分区状态(keyed State) 的值
4）使用不同 按键分区状态(keyed State) 进行操作

在这里插入图片描述

注意：CheckpointedFunction是检查点保存的内容，这一块还没有讲，这里先简单理解initializeState是从检查点获得数据，snapshotState是往检查点存入数据。

讲完了分类和使用步骤，下面就开始对常见的几种 按键分区状态(keyed State) 进行代码演示

2.3 代码示例

2.3.1 ValueState

方法	描述
value()	返回ValueState中的值
update()	更新ValueState中的值

示例说明：假设来自服务器的cpu值，使用一个ValueState存储累加cpu值

ValueStateDemo 类：

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.OpenContext;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

public class ValueStateDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> map = text.map(new Tuple3MapFunction());
        // 4. 定义单调递增watermark以及TimestampAssigner
        WatermarkStrategy<Tuple3<String,Double,Long>> watermarkStrategy = WatermarkStrategy
                // 设置单调递增
                .<Tuple3<String,Double,Long>>forMonotonousTimestamps()
                // 设置事件时间处理器
                .withTimestampAssigner((element, recordTimestamp) ->{
                    return element.f2 * 1000L;
                } );
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> mapWithWatermark = map.assignTimestampsAndWatermarks(watermarkStrategy);
        // 5. 做keyBy
        KeyedStream<Tuple3<String,Double,Long>, String> kyStream = mapWithWatermark.keyBy(new KeySelectorFunction());
        SingleOutputStreamOperator<String> process = kyStream.process(new KeyedProcessFunction<>() {

            // 1）定义ValueState值
            ValueState<Tuple3<String, Double, Long>> currentValue;

            @Override
            public void open(OpenContext openContext) throws Exception {
                super.open(openContext);
                // 2）初始化ValueState值
                ValueStateDescriptor<Tuple3<String, Double, Long>> descriptor = new ValueStateDescriptor<>("currentValue", Types.TUPLE(Types.STRING, Types.DOUBLE, Types.LONG));
                currentValue = getRuntimeContext().getState(descriptor);
            }

            @Override
            public void processElement(Tuple3<String, Double, Long> value, KeyedProcessFunction<String, Tuple3<String, Double, Long>, String>.Context ctx, Collector<String> out) throws Exception {
                // 3）获取ValueState值
                Tuple3<String, Double, Long> curValue = currentValue.value();
                if(curValue==null){
                    curValue = value;
                }else{
                    curValue.f1 = curValue.f1 + value.f1;
                }
                // 4）更新ValueState值
                currentValue.update(curValue);
                out.collect(curValue.toString());
            }

        });
        // 6. 打印
        process.print();
        // 执行
        env.execute();
    }


    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,Double,Long>> {

        @Override
        public Tuple3<String, Double, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            double value2 = Double.parseDouble("0");
            long value3 = 0;
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }

    public static class KeySelectorFunction implements KeySelector<Tuple3<String,Double,Long>, String> {

        @Override
        public String getKey(Tuple3<String,Double,Long> value) throws Exception {
            // 返回第一个值，作为keyBy的分类
            return value.f0;
        }

    }
}

输入：
在这里插入图片描述

输出：
在这里插入图片描述

知识点：总共输入3条数据
1）可以看到server1的cpu是累加的结果
2）可以看到server1和server2自动按照key分开存储，所以server2的cpu值是2.1

2.3.2 ListState

方法	描述
get()	返回ListState中的值，返回的是一个Iterable迭代接口
update()	更新ListState中的值，需要传入一个List数据
add()	添加一个数据到ListState
addAll()	添加一个List列表数据到ListState

示例说明：假设来自服务器的cpu值，使用一个ListState存储，并计算cpu的平均值

ListStateDemo 类：

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.OpenContext;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

import java.util.Iterator;

public class ListStateDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> map = text.map(new Tuple3MapFunction());
        // 4. 定义单调递增watermark以及TimestampAssigner
        WatermarkStrategy<Tuple3<String,Double,Long>> watermarkStrategy = WatermarkStrategy
                // 设置单调递增
                .<Tuple3<String,Double,Long>>forMonotonousTimestamps()
                // 设置事件时间处理器
                .withTimestampAssigner((element, recordTimestamp) ->{
                    return element.f2 * 1000L;
                } );
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> mapWithWatermark = map.assignTimestampsAndWatermarks(watermarkStrategy);
        // 5. 做keyBy
        KeyedStream<Tuple3<String,Double,Long>, String> kyStream = mapWithWatermark.keyBy(new KeySelectorFunction());
        SingleOutputStreamOperator<String> process = kyStream.process(new KeyedProcessFunction<>() {

            // 1）定义ListState值
            ListState<Tuple3<String, Double, Long>> currentValue;

            @Override
            public void open(OpenContext openContext) throws Exception {
                super.open(openContext);
                // 2）初始化ListState值
                ListStateDescriptor<Tuple3<String, Double, Long>> descriptor = new ListStateDescriptor<>("currentValue", Types.TUPLE(Types.STRING, Types.DOUBLE, Types.LONG));
                currentValue = getRuntimeContext().getListState(descriptor);
            }

            @Override
            public void processElement(Tuple3<String, Double, Long> value, KeyedProcessFunction<String, Tuple3<String, Double, Long>, String>.Context ctx, Collector<String> out) throws Exception {
                // 3）添加ListState数据
                currentValue.add(value);
                // 4）获取ListState值
                Iterator<Tuple3<String, Double, Long>> iterator = currentValue.get().iterator();
                double sum = 0;
                int num = 0;
                while (iterator.hasNext()){
                    Tuple3<String, Double, Long> tmpValue = iterator.next();
                    sum = sum + tmpValue.f1;
                    num++;
                }
                out.collect(value.f0 + "的平均cpu值=" + (sum/num));
            }

        });
        // 6. 打印
        process.print();
        // 执行
        env.execute();
    }


    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,Double,Long>> {

        @Override
        public Tuple3<String, Double, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            double value2 = Double.parseDouble("0");
            long value3 = 0;
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }

    public static class KeySelectorFunction implements KeySelector<Tuple3<String,Double,Long>, String> {

        @Override
        public String getKey(Tuple3<String,Double,Long> value) throws Exception {
            // 返回第一个值，作为keyBy的分类
            return value.f0;
        }

    }
}

输入：
在这里插入图片描述

输出：
在这里插入图片描述

知识点：
1）输入第一条数据，现在key为server1的数据只有1条，因此平均值为2.4
2）输入第二条数据，现在key为server1的数据有2条，因此平均值为2.5
3）输入第一条数据，现在key为server2的数据只有1条，因此平均值为2.1
4）输入第二条数据，现在key为server2的数据有2条，因此平均值为2.2

2.3.3 MapState

方法	描述
get()	返回MapState中某个key的值
put()	更新MapState中某个key的值
iterator()	返回MapState中map的迭代器
values()	返回MapState中map的所有value的迭代器
keys()	返回MapState中map的所有value的迭代器
contains()	判断MapState中map的是否包含某个key值
putAll()	更新MapState的值
entries()	返回MapState中map的迭代器
isEmpty()	判断MapState是否为空
remove()	移除MapState中某个key的值

示例说明：假设来自不同超市的商品卖出情况，统计不同超市不同商品卖出的数量

MapStateDemo类：

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.OpenContext;
import org.apache.flink.api.common.state.MapState;
import org.apache.flink.api.common.state.MapStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

import java.util.Iterator;
import java.util.Map;

public class MapStateDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,String,Long>> map = text.map(new Tuple3MapFunction());
        // 4. 定义单调递增watermark以及TimestampAssigner
        WatermarkStrategy<Tuple3<String,String,Long>> watermarkStrategy = WatermarkStrategy
                // 设置单调递增
                .<Tuple3<String,String,Long>>forMonotonousTimestamps()
                // 设置事件时间处理器
                .withTimestampAssigner((element, recordTimestamp) ->{
                    return element.f2 * 1000L;
                } );
        SingleOutputStreamOperator<Tuple3<String,String,Long>> mapWithWatermark = map.assignTimestampsAndWatermarks(watermarkStrategy);
        // 5. 做keyBy
        KeyedStream<Tuple3<String,String,Long>, String> kyStream = mapWithWatermark.keyBy(new KeySelectorFunction());
        SingleOutputStreamOperator<String> process = kyStream.process(new KeyedProcessFunction<>() {

            // 1）定义MapState值
            MapState<String,Long> currentValue;

            @Override
            public void open(OpenContext openContext) throws Exception {
                super.open(openContext);
                // 2）初始化MapState值
                MapStateDescriptor<String,Long> descriptor = new MapStateDescriptor<>("currentValue",Types.STRING, Types.LONG);
                currentValue = getRuntimeContext().getMapState(descriptor);
            }

            @Override
            public void processElement(Tuple3<String, String, Long> value, KeyedProcessFunction<String, Tuple3<String, String, Long>, String>.Context ctx, Collector<String> out) throws Exception {
                // 3）获取某个key值的LMapState数据
                Long num = currentValue.get(value.f1);
                if(num==null){
                    num = value.f2;
                }else {
                    num = num + value.f2;
                }
                // 4）更新LMapState数据
                currentValue.put(value.f1, num);
                // 4）获取所有LMapState数据
                Iterator<Map.Entry<String, Long>> iterator = currentValue.iterator();
                StringBuilder sb = new StringBuilder();
                sb.append("==== key=").append(value.f0).append(" start ======\n");
                while (iterator.hasNext()){
                    Map.Entry<String, Long> next = iterator.next();
                    sb.append("key=").append(next.getKey()).append(", value=").append(next.getValue()).append("\n");
                }
                sb.append("==== key=").append(value.f0).append(" end ======\n");
                out.collect(sb.toString());
            }

        });
        // 6. 打印
        process.print();
        // 执行
        env.execute();
    }


    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,String,Long>> {

        @Override
        public Tuple3<String, String, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            String value2 = "";
            long value3 = 0;
            if(values.length >= 2){
                value2 = values[1];
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }

    public static class KeySelectorFunction implements KeySelector<Tuple3<String,String,Long>, String> {

        @Override
        public String getKey(Tuple3<String,String,Long> value) throws Exception {
            // 返回第一个值，作为keyBy的分类
            return value.f0;
        }

    }
}

输入：
在这里插入图片描述

输出：
在这里插入图片描述

知识点：
1）输入第1条和第2条数据值，分别是不同supermarket的，因此goo1是分开的
2）输入第3条和第4条数据值，也是分别属于不同supermarket的，但是会与前面第1条和第2条数据值合并

2.3.4 ReducingState

reduce规约在前面已经讲过，就是不断累积成为一条数据。ReducingState只是Flink提供一个reduce操作的方便状态。

方法	描述
add()	将数据加入到ReducingState中，会触发ReducingStateDescriptor定义时的ReduceFunction方法
get()	获取ReducingState中累积到当前的结果

示例说明：假设来自服务器的cpu值，使用一个ReducingState存储累加cpu值，与ValueState示例一样

ReducingStateDemo类：

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.OpenContext;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

public class ReducingStateDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> map = text.map(new Tuple3MapFunction());
        // 4. 定义单调递增watermark以及TimestampAssigner
        WatermarkStrategy<Tuple3<String,Double,Long>> watermarkStrategy = WatermarkStrategy
                // 设置单调递增
                .<Tuple3<String,Double,Long>>forMonotonousTimestamps()
                // 设置事件时间处理器
                .withTimestampAssigner((element, recordTimestamp) ->{
                    return element.f2 * 1000L;
                } );
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> mapWithWatermark = map.assignTimestampsAndWatermarks(watermarkStrategy);
        // 5. 做keyBy
        KeyedStream<Tuple3<String,Double,Long>, String> kyStream = mapWithWatermark.keyBy(new KeySelectorFunction());
        SingleOutputStreamOperator<String> process = kyStream.process(new KeyedProcessFunction<>() {

            // 1）定义ReducingState值
            ReducingState<Tuple3<String, Double, Long>> currentValue;

            @Override
            public void open(OpenContext openContext) throws Exception {
                super.open(openContext);
                // 2）初始化ReducingState值
                ReducingStateDescriptor<Tuple3<String, Double, Long>> descriptor = new ReducingStateDescriptor<>(
                        "currentValue",
                        (ReduceFunction<Tuple3<String, Double, Long>>) (value1, value2) -> {
                            // 将cpu值累加到第一条数据，返回第一条数据
                            value1.f1 = value1.f1 + value2.f1;
                            return value1;
                        },
                        Types.TUPLE(Types.STRING, Types.DOUBLE, Types.LONG));
                currentValue = getRuntimeContext().getReducingState(descriptor);
            }

            @Override
            public void processElement(Tuple3<String, Double, Long> value, KeyedProcessFunction<String, Tuple3<String, Double, Long>, String>.Context ctx, Collector<String> out) throws Exception {
                // 3）更新ReducingState值
                currentValue.add(value);
                // 4）获取ReducingState值
                out.collect(currentValue.get().toString());
            }

        });
        // 6. 打印
        process.print();
        // 执行
        env.execute();
    }


    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,Double,Long>> {

        @Override
        public Tuple3<String, Double, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            double value2 = Double.parseDouble("0");
            long value3 = 0;
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }

    public static class KeySelectorFunction implements KeySelector<Tuple3<String,Double,Long>, String> {

        @Override
        public String getKey(Tuple3<String,Double,Long> value) throws Exception {
            // 返回第一个值，作为keyBy的分类
            return value.f0;
        }

    }
}

输入：
在这里插入图片描述

输出：
在这里插入图片描述

知识点：
这里使用和ValueState一样的示例和输入，可以看出输出结果是一样的。因此ReducingState功能就是前面提到的reduce方法规约功能一样

2.3.5 AggregatingState

Aggregate聚合与Reduce规约唯一不同之处就是Aggregate支持输入、累积器和输出的数据类型不一致，其它功能都是一样的。

方法	描述
add()	将数据加入到AggregatingState中，会触发AggregatingStateDescriptor定义时的AggregateFunction方法
get()	获取AggregatingState中累积到当前的结果

示例说明：假设来自服务器的cpu值，使用一个AggregatingState存储累加cpu值，与ReducingState示例一样

AggregatingStateDemo类

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.OpenContext;
import org.apache.flink.api.common.state.AggregatingState;
import org.apache.flink.api.common.state.AggregatingStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

public class AggregatingStateDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> map = text.map(new Tuple3MapFunction());
        // 4. 定义单调递增watermark以及TimestampAssigner
        WatermarkStrategy<Tuple3<String,Double,Long>> watermarkStrategy = WatermarkStrategy
                // 设置单调递增
                .<Tuple3<String,Double,Long>>forMonotonousTimestamps()
                // 设置事件时间处理器
                .withTimestampAssigner((element, recordTimestamp) ->{
                    return element.f2 * 1000L;
                } );
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> mapWithWatermark = map.assignTimestampsAndWatermarks(watermarkStrategy);
        // 5. 做keyBy
        KeyedStream<Tuple3<String,Double,Long>, String> kyStream = mapWithWatermark.keyBy(new KeySelectorFunction());
        SingleOutputStreamOperator<String> process = kyStream.process(new KeyedProcessFunction<>() {

            // 1）定义AggregatingState值
            AggregatingState<Tuple3<String, Double, Long>, Double> currentValue;

            @Override
            public void open(OpenContext openContext) throws Exception {
                super.open(openContext);
                // 2）初始化AggregatingState值
                AggregatingStateDescriptor<Tuple3<String, Double, Long>, Double, Double> descriptor = new AggregatingStateDescriptor<>(
                        "currentValue",
                        new AggregateFunction<>() {
                            @Override
                            public Double createAccumulator() {
                                return 0.0;
                            }

                            @Override
                            public Double add(Tuple3<String, Double, Long> value, Double accumulator) {
                                accumulator = accumulator + value.f1;
                                return accumulator;
                            }

                            @Override
                            public Double getResult(Double accumulator) {
                                return accumulator;
                            }

                            @Override
                            public Double merge(Double a, Double b) {
                                return 0.0;
                            }
                        },
                        Types.DOUBLE);
                currentValue = getRuntimeContext().getAggregatingState(descriptor);
            }

            @Override
            public void processElement(Tuple3<String, Double, Long> value, KeyedProcessFunction<String, Tuple3<String, Double, Long>, String>.Context ctx, Collector<String> out) throws Exception {
                // 3）更新AggregatingState值
                currentValue.add(value);
                // 4）获取AggregatingState值
                out.collect(currentValue.get().toString());
            }

        });
        // 6. 打印
        process.print();
        // 执行
        env.execute();
    }


    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,Double,Long>> {

        @Override
        public Tuple3<String, Double, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            double value2 = Double.parseDouble("0");
            long value3 = 0;
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }

    public static class KeySelectorFunction implements KeySelector<Tuple3<String,Double,Long>, String> {

        @Override
        public String getKey(Tuple3<String,Double,Long> value) throws Exception {
            // 返回第一个值，作为keyBy的分类
            return value.f0;
        }

    }
}

输入：
在这里插入图片描述

输出：
在这里插入图片描述

知识点：可以看出效果和ReducingState示例一样，唯一不同的是这里定义输出是一个Double，只输出了cpu汇总值

2.3.6 非RichFunction使用演示

示例说明：这里使用和前面ValueState一样的示例，累加cpu值

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.runtime.state.FunctionInitializationContext;
import org.apache.flink.runtime.state.FunctionSnapshotContext;
import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class ValueState2Demo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> map = text.map(new Tuple3MapFunction());
        // 4. 定义单调递增watermark以及TimestampAssigner
        WatermarkStrategy<Tuple3<String,Double,Long>> watermarkStrategy = WatermarkStrategy
                // 设置单调递增
                .<Tuple3<String,Double,Long>>forMonotonousTimestamps()
                // 设置事件时间处理器
                .withTimestampAssigner((element, recordTimestamp) ->{
                    return element.f2 * 1000L;
                } );
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> mapWithWatermark = map.assignTimestampsAndWatermarks(watermarkStrategy);
        // 5. 做keyBy
        KeyedStream<Tuple3<String,Double,Long>, String> kyStream = mapWithWatermark.keyBy(new KeySelectorFunction());
        // 6. 使用map做数据累加
        SingleOutputStreamOperator<String> process = kyStream.map(new CustomMapFunction());
        // 7. 打印
        process.print();
        // 执行
        env.execute();
    }

    public static class CustomMapFunction implements MapFunction<Tuple3<String,Double,Long>, String>, CheckpointedFunction {

        // 1）定义ValueState值
        ValueState<Double> currentValue;

        /**
         * 初始化算子状态
         */
        @Override
        public void initializeState(FunctionInitializationContext context) throws Exception {
            // 2）初始化ValueState值
            System.out.println("initializeState...");
            ValueStateDescriptor<Double> descriptor = new ValueStateDescriptor<>("currentValue", Types.DOUBLE);
            currentValue = context.getKeyedStateStore().getState(descriptor);
        }

        @Override
        public String map(Tuple3<String,Double,Long> value) throws Exception {
            double curValue = 0l;
            // 3）获取ValueState值
            if(currentValue.value()==null){
                curValue = value.f1;
            }else{
                curValue = currentValue.value() + value.f1;
            }
            // 4）更新ValueState值
            currentValue.update(curValue);
            value.f1 = curValue;
            return value.toString();
        }

        /**
         * 在检查点触发时的操作
         */
        @Override
        public void snapshotState(FunctionSnapshotContext context) throws Exception {
            System.out.println("snapshotState...");
        }

    }

    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,Double,Long>> {

        @Override
        public Tuple3<String, Double, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            double value2 = Double.parseDouble("0");
            long value3 = 0;
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }

    public static class KeySelectorFunction implements KeySelector<Tuple3<String,Double,Long>, String> {

        @Override
        public String getKey(Tuple3<String,Double,Long> value) throws Exception {
            // 返回第一个值，作为keyBy的分类
            return value.f0;
        }

    }
}

输入：
在这里插入图片描述

输出：
在这里插入图片描述

知识点：这个效果和ValueState一样，唯一不同的就是初始化时会调用initializeState方法进行初始化。其它的按键分区状态也是一样的，这里就不一一演示。

3 算子状态(Operator State)

3.1 常见的算子状态

算子状态(Operator State)与按键分区状态(keyed State)的区别在于是否经过keyBy，如果没有经过KeyBy则只能使用算子状态(Operator State)，也就是意味着算子状态在整个子任务中共享。下图为2个在算子状态下可以使用的状态。

在这里插入图片描述

下面就ListState和BroadcastState进行演示，同时你会发现还有一种UnionListState，其实也是ListState，下面也会讲一下UnionListState这种状态

3.2 代码演示

3.2.1 ListState

示例说明：假设来自服务器的cpu值，使用一个ListState存储累加cpu值

OperatorListStateDemo类：

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.runtime.state.FunctionInitializationContext;
import org.apache.flink.runtime.state.FunctionSnapshotContext;
import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class OperatorListStateDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Double> map = text.map(new DoubleMapFunction());
        // 5. 打印
        map.print();
        // 执行
        env.execute();
    }


    public static class DoubleMapFunction implements MapFunction<String, Double>, CheckpointedFunction {

        private Double cpu;
        private ListState<Double> cpuState;

        @Override
        public Double map(String value) throws Exception {
            String[] values = value.split(",");
            double value2 = Double.parseDouble("0");
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            cpu = cpu + value2;
            return cpu;
        }

        /**
         * 初始化算子状态
         */
        @Override
        public void initializeState(FunctionInitializationContext context) throws Exception {
            System.out.println("initializeState...");
            ListStateDescriptor<Double> descriptor = new ListStateDescriptor<>("cpu", Types.DOUBLE);
            cpuState = context.getOperatorStateStore().getListState(descriptor);
            if(context.isRestored()){
                cpu = cpuState.get().iterator().next();
            }else{
                cpu = 0.0;
            }
        }

        /**
         * 在检查点触发时，存储算子状态
         */
        @Override
        public void snapshotState(FunctionSnapshotContext context) throws Exception {
            System.out.println("snapshotState...");
            cpuState.clear();
            cpuState.add(cpu);
        }
    }
}

输入：
在这里插入图片描述

输出：
在这里插入图片描述

知识点：输入4条数据，2条来自server1，2条来自server2。并行度为2.
1）可以看到initializeState方法被调用2次，说每个子任务都会初始化一次
2）输入第1条数据时，在并行度子任务1下，所以值为2.4
3）输入第2条数据时，在并行度子任务2下，所以值为2.6
4）输入第3条数据时，在并行度子任务1下，子任务1原先的值为2.4，累加本次2.1，结果为4.5
5）输入第4条数据时，在并行度子任务2下，子任务2原先的值为2.6，累加本次2.3，结果为4.9
6）由此可见，每个子任务都是共享一个状态值

3.2.2 UnionListState

UnionListState实际上也是使用ListState，其获取代码如下，使用的是getUnionListState方法，但是返回的还是ListState

cpuState = context.getOperatorStateStore().getUnionListState(descriptor);

UnionListState 与 ListState 的不同主要体现在重启任务时，如果修改了并行度，状态分配策略不同。

ListState 修改并行度之后分配是按照轮询分配，比如下面示例，一开始有2个并行度，6个状态；重启之后修改为3个并行度，那么每个子任务都会得到2个状态。

在这里插入图片描述

UnionListState 修改并行度之后分配每个并行度都会得到全量状态，比如下面示例，一开始有2个并行度，6个状态；重启之后修改为3个并行度，那么每个子任务都会得到6个状态。

在这里插入图片描述

3.2.3 BroadcastState

BroadcastState主要是在BroadcastStream流中使用。BroadcastStream是一个具有广播状态的流。这可以由使用数据流的任何流创建。broadcast（MapStateDescriptor[]）方法并隐式地创建状态，用户可以在其中存储已创建的BroadcastStream的元素。请注意，BroadcastStream不能对流进一步的操作。唯一可用的选项是使用connect方法将它与键控流或非键控流连接起来。简单来讲，就是BroadcastStream只能通过connect连接与第三方的流连接，这样BroadcastStream中的数据或者状态才能实现广播到其它流的算子中。

示例说明：来自两条数据流，一条是服务器cpu信息（服务器id,cpu值,时间），一条是报警值设置流，通过报警值设置流动态设置cpu的报警值。

OperatorBroadcastDemo类：

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.state.BroadcastState;
import org.apache.flink.api.common.state.MapStateDescriptor;
import org.apache.flink.api.common.state.ReadOnlyBroadcastState;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.BroadcastStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.BroadcastProcessFunction;
import org.apache.flink.util.Collector;

public class OperatorBroadcastDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);
        // 2. 读取数据
        DataStreamSource<String> dataSource = env.socketTextStream("127.0.0.1", 8888);// 服务器cpu数据流
        DataStreamSource<String> configSource = env.socketTextStream("127.0.0.1", 9999);// cpu报警值数据流
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> dataMap = dataSource.map(new Tuple3MapFunction());

        MapStateDescriptor<String, Double> descriptor = new MapStateDescriptor<>("broadcast", Types.STRING, Types.DOUBLE);
        BroadcastStream<String> broadcast = configSource.broadcast(descriptor);
        SingleOutputStreamOperator<String> process = dataMap.connect(broadcast).process(new BroadcastProcessFunction<Tuple3<String, Double, Long>, String, String>() {

            private final String WARN_VALUE_KEY = "warnValue";
            private final double DEFAULT_WARN_VALUE = 80;

            @Override
            public void processElement(Tuple3<String, Double, Long> value, BroadcastProcessFunction<Tuple3<String, Double, Long>, String, String>.ReadOnlyContext ctx, Collector<String> out) throws Exception {
                ReadOnlyBroadcastState<String, Double> broadcastState = ctx.getBroadcastState(descriptor);
                Double warnValue = broadcastState.get(WARN_VALUE_KEY);
                if (warnValue==null){
                    warnValue = DEFAULT_WARN_VALUE;
                }
                if(warnValue.compareTo(value.f1)<=0){
                    out.collect("服务器id=" + value.f0 + ", 时间="+ value.f2+", cpu="+ value.f1 + " 超过了警戒值" + warnValue + ", 发生报警！！！！");
                }

            }

            @Override
            public void processBroadcastElement(String value, BroadcastProcessFunction<Tuple3<String, Double, Long>, String, String>.Context ctx, Collector<String> out) throws Exception {
                BroadcastState<String, Double> broadcastState = ctx.getBroadcastState(descriptor);
                double warnValue = DEFAULT_WARN_VALUE;
                try {
                    warnValue = Double.parseDouble(value);
                }catch (Exception ignored){
                    // 异常不做处理，默认80
                }
                // 将值放入到广播中
                broadcastState.put(WARN_VALUE_KEY, warnValue);
            }
        });
        // 5. 打印
        process.print();
        // 执行
        env.execute();
    }


    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,Double,Long>> {

        @Override
        public Tuple3<String, Double, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            double value2 = Double.parseDouble("0");
            long value3 = 0;
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }
}

输入1：先输入前2条数据，等待9999端口输入第1条数据后，再输入第3条数据
在这里插入图片描述

输入2：等待8888端口输入前2条数据后，才输入第1条数据
在这里插入图片描述

输出：
在这里插入图片描述

知识点：
1）8888端口输入前2条数据，只显示一条报警值，因为默认报警值是80，前2条数据只有一条大于80
2）9999端口输入第1条数据，这时候报警值被设置为50
3）8888端口输入第3条数据，显示报警，并且报警值已经被设置为50。
4）这里并行度设置为2，就是不同算子并行度子任务，广播都是共用的。

4 TTL

Flink提供这种 托管状态(Managed State) 的存储很方便，但是也会带来一个问题，就是如果用户想隔一段时间将数据如何清除，重新计算？比如前面ValueState案例，其ValueState是一直都会存在的，如果用户要清除是否需要自定义定时器的方式，定时清除数据。其实Flink提供了TTL（Time To Live 生存时间）的机制，只需要在初始化 托管状态(Managed State) 时设置TTL，即可自动定时清除。

示例说明：这里使用和前面ValueState一样的示例，累加cpu值，不过增加了TTL，定时10秒钟清除一次数据。

TTLDemo类：

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.OpenContext;
import org.apache.flink.api.common.state.StateTtlConfig;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

import java.time.Duration;

public class TTLDemo {

    public static void main(String[] args) throws Exception {
        // 1. 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        // 2. 读取数据
        DataStreamSource<String> text = env.socketTextStream("127.0.0.1", 9999);
        // 3. map做类型转换
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> map = text.map(new Tuple3MapFunction());
        // 4. 定义单调递增watermark以及TimestampAssigner
        WatermarkStrategy<Tuple3<String,Double,Long>> watermarkStrategy = WatermarkStrategy
                // 设置单调递增
                .<Tuple3<String,Double,Long>>forMonotonousTimestamps()
                // 设置事件时间处理器
                .withTimestampAssigner((element, recordTimestamp) ->{
                    return element.f2 * 1000L;
                } );
        SingleOutputStreamOperator<Tuple3<String,Double,Long>> mapWithWatermark = map.assignTimestampsAndWatermarks(watermarkStrategy);
        // 5. 做keyBy
        KeyedStream<Tuple3<String,Double,Long>, String> kyStream = mapWithWatermark.keyBy(new KeySelectorFunction());
        SingleOutputStreamOperator<String> process = kyStream.process(new KeyedProcessFunction<>() {

            // 1）定义ValueState值
            ValueState<Tuple3<String, Double, Long>> currentValue;

            @Override
            public void open(OpenContext openContext) throws Exception {
                super.open(openContext);
                // 2) 设置TTL策略
                StateTtlConfig stateTtlConfig = StateTtlConfig
                        // 设置10秒钟过期
                        .newBuilder(Duration.ofSeconds(10))
                        // 设置什么条件下更新过期时间，支持3种方式：创建和写入（OnCreateAndWrite）、读取和写入（OnReadAndWrite）、都不更新（Disabled）
                        .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
                        // 设置是否返回过期未清除数据（因为Flink清除数据机制是通过标识某个数据过期，另外有一个线程定时清除数据，这样就会导致有些数据过期，但是还未清除）
                        // 支持2种方式：不返回过期未清除数据（NeverReturnExpired）、返回过期未清除数据（ReturnExpiredIfNotCleanedUp）
                        .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
                        .build();
                // 3）初始化ValueState值
                ValueStateDescriptor<Tuple3<String, Double, Long>> descriptor = new ValueStateDescriptor<>("currentValue", Types.TUPLE(Types.STRING, Types.DOUBLE, Types.LONG));
                descriptor.enableTimeToLive(stateTtlConfig);
                currentValue = getRuntimeContext().getState(descriptor);
            }

            @Override
            public void processElement(Tuple3<String, Double, Long> value, KeyedProcessFunction<String, Tuple3<String, Double, Long>, String>.Context ctx, Collector<String> out) throws Exception {
                // 4）获取ValueState值
                Tuple3<String, Double, Long> curValue = currentValue.value();
                if(curValue==null){
                    curValue = value;
                }else{
                    curValue.f1 = curValue.f1 + value.f1;
                }
                // 5）更新ValueState值
                currentValue.update(curValue);
                out.collect(curValue.toString());
            }

        });
        // 6. 打印
        process.print();
        // 执行
        env.execute();
    }


    public static class Tuple3MapFunction implements MapFunction<String, Tuple3<String,Double,Long>> {

        @Override
        public Tuple3<String, Double, Long> map(String value) throws Exception {
            String[] values = value.split(",");
            String value1 = values[0];
            double value2 = Double.parseDouble("0");
            long value3 = 0;
            if(values.length >= 2){
                try {
                    value2 = Double.parseDouble(values[1]);
                }catch (Exception e){
                    value2 = Double.parseDouble("0");
                }
            }
            if(values.length >= 3){
                try {
                    value3 = Long.parseLong(values[2]);
                }catch (Exception ignored){
                }
            }
            return new Tuple3<>(value1,value2,value3);
        }
    }

    public static class KeySelectorFunction implements KeySelector<Tuple3<String,Double,Long>, String> {

        @Override
        public String getKey(Tuple3<String,Double,Long> value) throws Exception {
            // 返回第一个值，作为keyBy的分类
            return value.f0;
        }

    }
}

输入：输入第1条和第2条数据，隔10秒钟之后，再输入第3条数据
在这里插入图片描述

输出：
在这里插入图片描述

知识点：总共输入3条数据，第3条数据是隔10秒钟之后再输入的
1）控制台可以看到第2次输出时，是累加第1次是数据，所以cpu值=5
2）第3次输出时，其cpu值=2.8，可以看出没有累加之前的数据，因此设置TTL=10秒，超过10秒之后就自动清除数据。

5 状态后端（State Backends）

5.1 分类

Flink的状态定义获取只是Flink有状态特性的一方面，状态实际上还需要进行存储，这样任务下次重启时，可以从某个状态开始运行，不用从头开始，这样才体现Flink有状态特性的完整性。如果使用 托管状态(Managed State) 的，Flink内部会进行自动落盘存储。那么它存在在什么地方？

根据《官方文档》，目前Flink支持2种方式的的状态后端（State Backends）存储方式：

HashMapStateBackend：使用hashMap存储，主要是存储子任务的JVM内存，这是默认的就是这种方式。这种方式因为会放在内存存储中，因此存储速度快，但同时需要占用JVM的内存，因此不适合数量过大的状态存储。（注意：使用内存存储并不是说它不会落盘，落盘是checkpoint的功能，这一块下一章再讲）
EmbeddedRocksDBStateBackend：使用RocksDB数据库进行存储，有一个本地落盘的数据库。它需要将状态进行序列化落盘。这种方式会序列化落到本地硬盘，因此存储速度会受影响，但是由于落地到硬盘，因此适合数据量大的状态存储。

5.2 使用方式

Flink提供3种方式设置 状态后端（State Backends） ，其设置优先级（从高到低）：

Job 代码配置

Configuration config = new Configuration();
config.set(StateBackendOptions.STATE_BACKEND, "hashmap"); // HashMapStateBackend
config.set(StateBackendOptions.STATE_BACKEND, "rocksdb"); // EmbeddedRocksDBStateBackend，如果在本地运行，需要引入flink-statebackend-rocksdb依赖

Job 提交时指定

flink run-application -t yarn-applicaiton -p 3 -Dstate.backend.type=rocksdb -c 全类名 jar包

flink-conf.yaml 中的全局配置

state.backend.type: hashmap  # 类型配置，可以是hashmap或者rocksdb 
state.backend.incremental: false # 是否增量配置

结语：本章讲解了Flink的状态管理，状态管理只是Flink有状态特性的一方面，状态实际上还需要进行落盘，这样任务下次重启时，可以从某个状态开始运行，不用从头开始，这样才体现Flink有状态特性的完整性。下一章就开始讲解Flink的落盘处理：检查点。

Flink 系列之二十 - 高级概念 - 状态管理

目录

1 状态的分类

2 按键分区状态(keyed State)

2.1 常用的按键分区状态

2.2 使用按键分区状态的步骤

2.2.1 继承RichFunction函数使用按键分区状态的步骤

2.2.2 非RichFunction函数使用按键分区状态的步骤

2.3 代码示例

2.3.1 ValueState

2.3.2 ListState

2.3.3 MapState

2.3.4 ReducingState

2.3.5 AggregatingState

2.3.6 非RichFunction使用演示

3 算子状态(Operator State)

3.1 常见的算子状态

3.2 代码演示

3.2.1 ListState

3.2.2 UnionListState

3.2.3 BroadcastState

4 TTL

5 状态后端（State Backends）

5.1 分类

5.2 使用方式