Flink keyed State

Flink做sum, reduce等聚合的时候,我们一般直接使用Flink自带的sum, reduce来完成,有的需求需要自己写代码来完成。这其中的原理就是keyed state.  具体 可以参考:

https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/stream/state/state.html

所以keyed state就是状态的意思,根据key做分区,每次计算的值要做记录,这样下次可以在这个 值的基础上做累加,这个就叫状态。如果不采用state,是无法做计算的,因此上次的数据你根本不知道。

接下来我们来使用自带的sum以及自己写代码完成wordcount

1) 自带的sum


import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class sum {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<String> ds = env.socketTextStream("10.203.0.53",8000);
        ds.flatMap(new FlatMapFunction<String, Tuple2<String,Integer>>() {
            @Override
            public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
                for(String ss :  s.split(",")){
                    collector.collect(new Tuple2<String,Integer>(ss,1));
                }
            }
        }).keyBy(0).sum(1).print();


        env.execute("sum ");
    }
}

以上代码机器简单,我们根据逗号分隔字符串,构成Tuple2,然后通过keyBy分区,进行sum

2. 自己编写代码


import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class keyedStream {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env =  StreamExecutionEnvironment.getExecutionEnvironment();
        env.enableCheckpointing(300);
        DataStream<String> ds = env.socketTextStream("10.203.0.53", 8000);
        DataStream<Tuple2<String, Integer>> ds2 = ds.flatMap(new FlatMapFunction<String, Tuple2<String,Integer>>() {

            @Override
            public void flatMap(String s, Collector<Tuple2<String,Integer>> collector) throws Exception {
                for(String ss : s.split(",")){
                    collector.collect(new Tuple2<String,Integer>(ss,1));
                }

            }
        });

        //根据key分区
        KeyedStream<Tuple2<String, Integer>,Tuple> keyedStream = ds2.keyBy(0);
        DataStream<Tuple2<String, Integer>> sum = keyedStream.map(new RichMapFunction<Tuple2<String, Integer>, Tuple2<String,Integer>>() {
            private transient ValueState<Tuple2<String,Integer>> valueState;

            @Override
            public void open(Configuration parameters) throws Exception {
                super.open(parameters);

                //初始化keyed state,并设置初始值为 Tuple2.of("",0)
                ValueStateDescriptor<Tuple2<String,Integer>> stateDescriptor = new ValueStateDescriptor("sum-key-state", Types.TUPLE(Types.STRING, Types.INT),Tuple2.of("",0));
                valueState = getRuntimeContext().getState(stateDescriptor);

            }

            @Override
            public Tuple2<String,Integer> map(Tuple2<String, Integer> stringIntegerTuple2) throws Exception {
                //获取状态
                Tuple2<String,Integer> count = valueState.value();

                count.f0 = stringIntegerTuple2.f0;
                //相同的key做累加
                count.f1 += stringIntegerTuple2.f1;

                //更新状态
                valueState.update(count);

                return count;
            }
        });
        sum.print();

        env.execute("keyed Stream ");
    }
}

通过构建ValueState,每次更新的数据通过valueState.update进行更新,这样下次就可以继续使用此值进行累加。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

tom_fans

谢谢打赏

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值