Flink Word Count 批处理代码样例
public class WordCountBatch {
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env
.readTextFile("Java/src/main/resources/flink/word_count_batch.txt")
.flatMap(new WordCountFlatMapFunction())
.groupBy(x -> x.f0)
.reduce(new WordCountReduceFunction())
.print();
// env.execute("WordCountBatch");
}
}
注意事项:
- 在
flatMap、groupBy、reduce
等操作时,尽可能创建一个新的类来实现对应的接口,而不是直接在一个文件里实现所有代码
flatMap
过程
创建一个新类WordCountFlatMapFunction
, 然后.flatMap(new WordCountFlatMapFunction())
即可
package org.chenxilin.flink.wordcount;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
public class WordCountFlatMapFunction implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> collector) {
String[] splits = value.toLowerCase().split("\\W+");
for (String split : splits) {
if (split.length() > 0) {
collector.collect(new Tuple2<>(split, 1));
}
}
}
}
reduce过程
package org.chenxilin.flink.wordcount;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple2;
public class WordCountReduceFunction implements ReduceFunction<Tuple2<String, Integer>> {
@Override
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> t1, Tuple2<String, Integer> t2) {
return new Tuple2<>(t1.f0, t1.f1 + t2.f1);
}
}
- 最后一行注释掉的代码
env.execute("WordCountBatch");
在批处理模式下是不需要的,因为批处理下的print方法已经包含此功能,如果加上这行代码,会报错
Exception in thread "main" java.lang.RuntimeException: No new data sinks have been defined since the last execution. The last execution refers to the latest call to 'execute()', 'count()', 'collect()', or 'print()'.
原因:对于离线批处理的算子,如:“count()”、“collect()”或“print()”等既有sink功能,还有触发的功能。
解决方案:
我们上面调用了print()方法,会自动触发execute,最后一行执行器没有数据可以执行,去掉最后一行代码即可。
全部源代码:https://github.com/xiligey/Notes/tree/master/Java/src/main/java/org/chenxilin/flink/wordcount