Flink官网看到这么个例子:
DataSet<WC> words = // [...]
DataSet<WC> wordCounts = words
// DataSet grouping on field "word"
.groupBy("word")
// apply ReduceFunction on grouped DataSet
.reduce(new WordCounter());
第一种分开写WordCounter.java的办法:
import org.apache.flink.api.common.functions.ReduceFunction;
public class WordCounter implements ReduceFunction<WC> {
@Override
public WC reduce(WC in1, WC in2) {
return new WC(in1.word, in1.count + in2.count);
}
}
第二种办法:
原始代码:
JavaPairRDD<Long, Long> removedRandomPrefixRdd = localAggrRdd.mapToPair(
new PairFunction<Tuple2<String,Long>, Long, Long>() {
private static final long serialVersionUID = 1L;
@Override
public Tuple2<Long, Long> call(Tuple2<String, Long> tuple)
throws Exception {
long originalKey = Long.valueOf(tuple._1.split("_")[1]);
return new Tuple2<Long, Long>(originalKey, tuple._2);
}
});
修改后的代码:
private static PairFunction <Tuple2<String,Long>, Long, Long> func3=new PairFunction<Tuple2<String,Long>, Long, Long>()
{
private static final long serialVersionUID = 1L;
// @Override
public Tuple2<Long, Long> call(Tuple2<String, Long> tuple)throws Exception
{
long originalKey;
originalKey = Long.valueOf(tuple._1.split("_")[1]);
return new Tuple2<Long, Long>(originalKey, tuple._2);
}
};
// 第三步,去除RDD中每个key的随机前缀。
JavaPairRDD<Long, Long> removedRandomPrefixRdd = localAggrRdd.mapToPair(func3);
修改办法:
new PairFunction<Tuple2<String,Long>, Long, Long>()
改成:
private static PairFunction <Tuple2<String,Long>, Long, Long> func3=new PairFunction<Tuple2<String,Long>, Long, Long>()