Flink统计同个数据源,DataSet API能统计数量为1的单词并打印出来,在DataStream API中用批处理模式无法打印出统计结果为1的单词

1.问题:对于统计同个数据源,DataSet API能统计数量为1的单词并打印出来,在DataStream API中用批处理模式无法打印出统计结果为1的单词

"word world hello", "word world world hello", "kuan hello word"

2.问题截图(批计算/流计算):

3.代码(批计算/流计算):

public class DataSetWordCount {

    public static void main(String[] args) throws Exception{

        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
       
        DataSet<String> source = env.fromElements("word world hello", "word world world hello", "kuan hello word");
       
        DataSet<Tuple2<String, Integer>> wordCount = source.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
              
                String[] words= value.split(" ");
                for (String word: words){
                    out.collect(new Tuple2<>(word,1));
                }
            }
        }).groupBy(0).sum(1);
        
        wordCount.print();
    }
}
public class DataStreamWordCount1 {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        env.setRuntimeMode(RuntimeExecutionMode.BATCH);

        DataStream<String> lines = env.fromElements("word world hello", "word world world hello", "kuan hello word");

        DataStream<Tuple2<String, Integer>> result = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                String[] words = value.split(" ");
                for (String word : words) {
                    out.collect(new Tuple2<>(word,1));
                }
            }
        }).keyBy(t -> t.f0).sum(1);
     
        result.print();
      
        env.execute();

    }
}

4.原因:数据没有计算完,slot就关闭了

5.

(1)flink1.12.0版本解决方案 加上window,就可以把只有1个数量的单词“kuan”统计出来了

public class DataStreamWordCount1 {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.BATCH);

        DataStream<String> lines = env.fromElements("word world hello", "word world world hello", "kuan hello word");

        DataStream<Tuple2<String, Integer>> result = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                String[] words = value.split(" ");
                for (String word : words) {
                    out.collect(new Tuple2<>(word,1));
                }
            }
        }).keyBy(t -> t.f0).window(TumblingProcessingTimeWindows.of(Time.seconds(1))).sum(1).setParallelism(1);
     
        result.print();
      
        env.execute();

    }
}
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/E:/repository2/maven_repository/org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/E:/repository2/maven_repository/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-06-29 11:33:39,998 WARN  [main] webmonitor.WebMonitorUtils$LogFileLocation (WebMonitorUtils.java:85) - Log file environment variable 'log.file' is not set.
2021-06-29 11:33:40,003 WARN  [main] webmonitor.WebMonitorUtils$LogFileLocation (WebMonitorUtils.java:91) - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'web.log.path'.
2021-06-29 11:33:41,693 WARN  [Window(TumblingProcessingTimeWindows(1000), ProcessingTimeTrigger, SumAggregator, PassThroughWindowFunction) (1/1)#0] groups.TaskMetricGroup (TaskMetricGroup.java:145) - The operator name Window(TumblingProcessingTimeWindows(1000), ProcessingTimeTrigger, SumAggregator, PassThroughWindowFunction) exceeded the 80 characters length limit and was truncated.
1> (world,3)
7> (word,3)
6> (kuan,1)
8> (hello,3)

Process finished with exit code 0

(2)或者升级flink版本改为1.12.4

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值