移动平均:对时序序列按周期取其值的平均值,这种运算被称为移动平均。典型例子是求股票的n天内的平均值。
移动平均的关键是如何求这个平均值,可以使用Queue来实现。
public class MovingAverageDriver {
public static void main(String[] args){
SparkConf conf = new SparkConf().setAppName("MovingAverageDriver");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> lines = sc.textFile("hdfs://hadoop000:8020/dataalgorithminput/MovingAverage.txt");
List<String> list =
lines.mapToPair(new PairFunction<String, CodeDatePair, DatePricePair>() {
@Override
public Tuple2<CodeDatePair, DatePricePair> call(String line) throws Exception {
String[] array = line.split(",");
return new Tuple2<CodeDatePair, DatePricePair>(new CodeDatePair(array[0],array[1]),new DatePricePair(array[1],array[2]));
}
}).repartitionAndSortWithinPartitions(new CodeDatePartitioner())
.mapToPair(new PairFunction<Tuple2<CodeDatePair,DatePricePair>, String, DatePricePair>() {
@Override
public Tuple2<String, DatePricePair> call(Tuple2<CodeDatePair, DatePricePair> tuple2) throws Exception {
return new Tuple2<String, DatePricePair>(tuple2._1().getCode(), tuple2._2());
}
})
.groupByKey()
.flatMapValues(new Function<Iterable<DatePricePair>, Iterable<Tuple2<DatePricePair, Double>>>(){
@Override
public Iterable<Tuple2<DatePricePair, Double>> call(Iterable<DatePricePair> v1) throws Exception {
int size = 3;
List<Tuple2<DatePricePair, Double>> list = new ArrayList<Tuple2<DatePricePair, Double>>();
Queue<Integer> queue= new LinkedList<Integer>();
for(DatePricePair item : v1){
queue.offer(item.getPrice());
if(queue.size() > size){
queue.remove();
}
int sum = 0;
for(Integer price : queue){
sum += price;
}
list.add(new Tuple2<DatePricePair, Double>(item, Double.valueOf(sum)/Double.valueOf(queue.size())));
}
return list;
}
}).map(new Function<Tuple2<String,Tuple2<DatePricePair,Double>>, String>() {
@Override
public String call(Tuple2<String, Tuple2<DatePricePair, Double>> v1) throws Exception {
return v1._1() + " " + v1._2()._1().getDate() + " " + v1._2()._1().getPrice() + " " + v1._2()._2();
}
}).collect();
for(String item : list){
System.out.println(item);
}
}
}
①对一只股票按照date来排序
②对数据集根据股票的Code进行groupByKey操作,生成的value是按照时间增大顺序排列的date-price对
③在flatMapValues()里使用Queue实现对连续3天的股价求平均值,并展开。