学习Flink的时候第一个入门程序WordCount,官方给的使用匿名类实现方法,这样看起来代码不简洁。于是想用lamda改写下,踩了不少坑,记录下。
Table of Contents
错误2: .keyBy("word") 类型不能做key的错误
flink 版本 1.9
官方给定版本
public class SocketWindowWordCount { public static void main(String[] args) throws Exception { // the port to connect to final int port; try { final ParameterTool params = ParameterTool.fromArgs(args); port = params.getInt("port"); } catch (Exception e) { System.err.println("No port specified. Please run 'SocketWindowWordCount --port <port>'"); return; } // get the execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // get input data by connecting to the socket DataStream<String> text = env.socketTextStream("localhost", port, "\n"); // parse the data, group it, window it, and aggregate the counts DataStream<WordWithCount> windowCounts = text .flatMap(new FlatMapFunction<String, WordWithCount>() { @Override public void flatMap(String value, Collector<WordWithCount> out) { for (String word : value.split("\\s")) { out.collect(new WordWithCount(word, 1L)); } } }) .keyBy("word") .timeWindow(Time.seconds(5), Time.seconds(1)) .reduce(new ReduceFunction<WordWithCount>() { @Override public WordWithCount reduce(WordWithCount a, WordWithCount b) { return new WordWithCount(a.word, a.count + b.count); } }); // print the results with a single thread, rather than in parallel windowCounts.print().setParallelism(1); env.execute("Socket Window WordCount"); } // Data type for words with count public static class WordWithCount { public String word; public long count; public WordWithCount() {} public WordWithCount(String word, long count) { this.word = word; this.count = count; } @Override public String toString() { return word + " : " + count; } } }
Lamda第一版 POJO版
package com.my.study.flink; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.util.Collector; /** * Description: * * @author adore.chen * @date 2019-11-19 */ public class SocketStreamWordCount { public static void main(String[] args) throws Exception { ParameterTool tool = ParameterTool.fromArgs(args); int port = tool.getInt("port"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> dataStream = env.socketTextStream("localhost", port, "\n"); dataStream.flatMap((String value, Collector<WordCount> out) -> { for (String word: value.split("\\s")) { if (word.trim().length()>0) { out.collect(new WordCount(word, 1)); } } }) .returns(WordCount.class) .keyBy((WordCount wc) -> wc.word) .reduce((WordCount wc1, WordCount wc2) -> new WordCount(wc1.word, wc1.count + wc2.count)) .print(); env.execute("socket word count"); } public static class WordCount { private String word; private int count; public WordCount(String word, int count) { this.word = word; this.count = count; } @Override public String toString() { return word + ":" +count; } } }
错误1:Collector无泛型参数错误
InvalidTypesException: The generic type parameters of 'Collector' are missing. In many cases lambda methods don't provide enough information for automatic type extraction when Java generics are involved. An easy workaround is to use an (anonymous) class instead that implements the 'org.apache.flink.api.common.functions.FlatMapFunction' interface. Otherwise the type has to be specified explicitly using type information.
at org.apache.flink.api.java.typeutils.TypeExtractionUtils.validateLambdaType(TypeExtractionUtils.java:350)
at org.apache.flink.api.java.typeutils.TypeExtractionUtils.extractTypeFromLambda(TypeExtractionUtils.java:176)
at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:571)
at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:196)
at org.apache.flink.streaming.api.datastream.DataStream.flatMap(DataStream.java:611)
at com.coupang.ecfds.flink.SocketStreamWordCount.main(SocketStreamWordCount.java:24)
Lamda表达式编译之后,编译器擦除了泛型GenericType,所以不知道返回类型,需要显示指定。通过 returns(TypeInformation)语句指定。
详细参考:Flink TypeInformation https://www.cnblogs.com/qcloud1001/p/9626462.html
错误2: .keyBy("word") 类型不能做key的错误
InvalidProgramException: This type (GenericType<com.coupang.ecfds.flink.SocketStreamWordCount.WordCount>) cannot be used as key.
at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)
at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)
at com.coupang.ecfds.flink.SocketStreamWordCount.main(SocketStreamWordCount.java:32)
这应该是Flink代码的一个错误,懒得去改了,直接使用lamda表达式实现 KeySelector函数接口解决。
解决方案
.keyBy((WordCount wc) -> wc.word)
参考资料:KeySelector https://www.jianshu.com/p/3763854d609b
Lamda第二版 Tuple2版
package com.coupang.ecfds.flink; import org.apache.flink.api.common.typeinfo.Types; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.util.Collector; /** * Description: * * @author adore.chen * @date 2019-11-19 */ public class SocketStreamWordCount { public static void main(String[] args) throws Exception { ParameterTool tool = ParameterTool.fromArgs(args); int port = tool.getInt("port"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> dataStream = env.socketTextStream("localhost", port, "\n"); dataStream.flatMap((String value, Collector<Tuple2<String,Integer>> out) -> { for (String word: value.split("\\s")) { if (word.trim().length()>0) { out.collect(new Tuple2<>(word, 1)); } } }) .returns(Types.TUPLE(Types.STRING, Types.INT)) .keyBy(0) .reduce((Tuple2<String,Integer> wc1, Tuple2<String,Integer> wc2) -> new Tuple2<>(wc1.f0, wc1.f1 + wc2.f1)) .print(); env.execute("socket word count"); } }
感觉简洁了不少。