1.问题:对于统计同个数据源,DataSet API能统计数量为1的单词并打印出来,在DataStream API中用批处理模式无法打印出统计结果为1的单词
"word world hello", "word world world hello", "kuan hello word"
2.问题截图(批计算/流计算):
3.代码(批计算/流计算):
public class DataSetWordCount {
public static void main(String[] args) throws Exception{
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> source = env.fromElements("word world hello", "word world world hello", "kuan hello word");
DataSet<Tuple2<String, Integer>> wordCount = source.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
String[] words= value.split(" ");
for (String word: words){
out.collect(new Tuple2<>(word,1));
}
}
}).groupBy(0).sum(1);
wordCount.print();
}
}
public class DataStreamWordCount1 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
DataStream<String> lines = env.fromElements("word world hello", "word world world hello", "kuan hello word");
DataStream<Tuple2<String, Integer>> result = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
String[] words = value.split(" ");
for (String word : words) {
out.collect(new Tuple2<>(word,1));
}
}
}).keyBy(t -> t.f0).sum(1);
result.print();
env.execute();
}
}
4.原因:数据没有计算完,slot就关闭了
5.
(1)flink1.12.0版本解决方案 加上window,就可以把只有1个数量的单词“kuan”统计出来了
public class DataStreamWordCount1 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
DataStream<String> lines = env.fromElements("word world hello", "word world world hello", "kuan hello word");
DataStream<Tuple2<String, Integer>> result = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
String[] words = value.split(" ");
for (String word : words) {
out.collect(new Tuple2<>(word,1));
}
}
}).keyBy(t -> t.f0).window(TumblingProcessingTimeWindows.of(Time.seconds(1))).sum(1).setParallelism(1);
result.print();
env.execute();
}
}
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/E:/repository2/maven_repository/org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/E:/repository2/maven_repository/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-06-29 11:33:39,998 WARN [main] webmonitor.WebMonitorUtils$LogFileLocation (WebMonitorUtils.java:85) - Log file environment variable 'log.file' is not set.
2021-06-29 11:33:40,003 WARN [main] webmonitor.WebMonitorUtils$LogFileLocation (WebMonitorUtils.java:91) - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'web.log.path'.
2021-06-29 11:33:41,693 WARN [Window(TumblingProcessingTimeWindows(1000), ProcessingTimeTrigger, SumAggregator, PassThroughWindowFunction) (1/1)#0] groups.TaskMetricGroup (TaskMetricGroup.java:145) - The operator name Window(TumblingProcessingTimeWindows(1000), ProcessingTimeTrigger, SumAggregator, PassThroughWindowFunction) exceeded the 80 characters length limit and was truncated.
1> (world,3)
7> (word,3)
6> (kuan,1)
8> (hello,3)
Process finished with exit code 0
(2)或者升级flink版本改为1.12.4