在用flink做word_count练习时,代码如下:
public class WordCount_Java {
public static void main(String[] args) throws Exception{
//设置服务器地址和端口号,在master上执行命令“nc -lk 6666”,就能作为数据输入的端口
int port = 6666;
String hostName = "master";
//初始化env对象,相当于spark的上下文对象
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
//对接服务器及端口,以获取数据
DataStreamSource<String> data = env.socketTextStream(hostName, port);
//开始计算
//生成一个个集合:(word,1)
SingleOutputStreamOperator<WordWithCount> wordWithNum = data.flatMap(
new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String s,
Collector<WordWithCount> out) throws Exception {
//这里的Collector是flink.util.Collector的
String[] words = s.split("\\s");
for (String word : words) {
out.collect(new WordWithCount(word, 1L));
}
}
}
);
//将元组按照key进行分组
KeyedStream<WordWithCount, Tuple> grouped = wordWithNum.keyBy("word");
//用分组后的对象调用窗口操作
//设置窗口大小和滑动间隔
WindowedStream<WordWithCount, Tuple, TimeWindow> window =
grouped.timeWindow(Time.seconds(2), Time.seconds(1));
SingleOutputStreamOperator<WordWithCount> countsRes =
window.sum("count");
countsRes.print().setParallelism(1);
env.execute();
}
private static class WordWithCount {
public String word;
public long count;
public WordWithCount(){
}
public WordWithCount(String word,long count){
this.word = word;
this.count = count;
}
@Override
public String toString() {
return "WordWithCount{" +
"word='" + word + '\'' +
", count=" + count +
'}';
}
}
}
运行后报错:
Exception in thread "main" org.apache.flink.api.common.InvalidProgramException:
This type (GenericType<com.cbq.flink_test_na1906.WordCount_Java.WordWithCount>) cannot be used as key.
百思不得其解,辗转几次,发现定义的类必须是public
的,改过来就正常了。另外,自定义的这个类必须构造无参构造器。