要解决移动平均问题,提供两个简单Java对象解决方案:
解决方案1:使用java.util.Queue
package simpleMoving;
/**
* SimpleMovingAverage
* 使用队列实现POJO移动平均
*/
import java.util.LinkedList;
import java.util.Queue;
public class SimpleMovingAverage {
private double sum = 0.0;
private int period;
private final Queue<Double> window = new LinkedList<Double>();
public SimpleMovingAverage(int period) {
if(period < 1){
throw new IllegalArgumentException("period must be > 0");
}
this.period = period;
}
public void addNewNumber(double number){
sum += number;
window.add(number);
if(window.size() > period){
sum -= window.remove();
}
}
public double getMovingAverage(){
if(window.isEmpty()){
throw new IllegalArgumentException("average is undefined");
}
return sum / window.size();
}
}
解决方案2: 使用数组模拟队列
package simpleMoving;
/**
*SimpleMovingAverageUsingArray
* 使用数组实现POJO移动平均
*/
public class SimpleMovingAverageUsingArray {
private double sum = 0.0;
private int period;
private double[] window = null;
private int pointer = 0;
private int size = 0;
public SimpleMovingAverageUsingArray(int period){
if(period < 1){
throw new IllegalArgumentException("period must be > 0");
}
this.period = period;
window = new double[period];
}
public void addNewNumber(double number){
sum += number;
if(size < period){
window[pointer++] = number;
size ++;
}else{
pointer = pointer % period;
sum -= window[pointer];
window[pointer++] = number;
}
}
public double getMovingAverage(){
if (size == 0){
throw new IllegalArgumentException("average is undefined");
}
return sum / size;
}
}
测试移动平均算法:
package simpleMoving;
import java.util.logging.Logger;
public class TestSimpleMovingAverage {
private static final Logger THE_LOGGER =
Logger.getLogger("TestSimpleMovingAverage");
public static void main(String[] args) {
double[] testData = {10, 18, 20, 30, 24, 33, 27};
int[] allWindowSizes = {3,4};
//使用队列
for (int windowSize : allWindowSizes){
SimpleMovingAverage sma = new SimpleMovingAverage(windowSize);
THE_LOGGER.info("winSize = " + windowSize);
for(double x : testData){
sma.addNewNumber(x);
THE_LOGGER.info("Next number = " + x + ", SMA = "+ sma.getMovingAverage());
}
THE_LOGGER.info("---");
}
THE_LOGGER.info("----------------------------------------------");
//使用数组
for (int windowSize : allWindowSizes){
SimpleMovingAverageUsingArray sma1 = new SimpleMovingAverageUsingArray(windowSize);
THE_LOGGER.info("winSize = " + windowSize);
for(double x : testData){
sma1.addNewNumber(x);
THE_LOGGER.info("Next number = " + x + ", SMA = "+ sma1.getMovingAverage());
}
THE_LOGGER.info("---");
}
}
}
使用MapReduce/Hadoop实现移动平均解决方案:
方案1:在内存中排序:
Hadoop实现类:
类名 | 描述 |
---|---|
SortInMemory_MovingAverageDriver | 提交Hadoop作业的驱动器 |
SortInMemory_MovingAverageMapper | 定义map() |
SortInMemory_MovingAverageReducer | 定义reduce() |
TimeSeriesData | 将时间序列数据点表示为一个(timestamp,double)对 |
DateUtil | 提供基本数据转换工具 |
实现代码如下:
TimeSeriesData:
package simpleMoving.MR;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
/**
* @class_name TimeSeriesData
* @author rk
* @date 2018/9/17 11:36
* @Description: <name-as-string><,><timestamp><,><value-as-double>
*/
public class TimeSeriesData implements WritableComparable<TimeSeriesData> {
private long timestamp;
private double value;
/* public static TimeSeriesData copy(TimeSeriesData tsd){
return new TimeSeriesData(tsd.timestamp, tsd.value);
}*/
public TimeSeriesData(long timestamp, double value){
this.timestamp = timestamp;
this.value = value;
}
public TimeSeriesData() {
}
public long getTimestamp() {
return timestamp;
}
public double getValue() {
return value;
}
@Override
public String toString() {
return "TimeSeriesData{" +
"timestamp=" + timestamp +
", value=" + value +
'}';
}
public void setTimestamp(long timestamp) {
this.timestamp = timestamp;
}
public void setValue(double value) {
this.value = value;
}
public int compareTo(TimeSeriesData o) {
long diff = o.getTimestamp() - this.getTimestamp();
if(diff == 0){
return 0;
}else{
return diff < 0 ? 1 : -1;
}
}
public void write(DataOutput out) throws IOException {
out.writeLong(timestamp);
out.writeDouble(value);
}
public void readFields(DataInput in) throws IOException {
this.timestamp = in.readLong();
this.value = in.readDouble();
}
}
DateUtil:
package simpleMoving.MR;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.*;
/**
* @Author rk
* @Date 2018/9/17 12:05
* @Description:
**/
public class DateUtil {
public static TimeSeriesData getTimeSeriesData(String value, String field) {
String[] splits = value.split(field);
Long ts = dateToStamp(splits[1]);
return new TimeSeriesData(ts, Double.parseDouble(splits[2]));
}
public static List<TimeSeriesData> sort(Iterable<TimeSeriesData> value){
List<TimeSeriesData> list = new ArrayList<TimeSeriesData>();
for (TimeSeriesData t : value){
//必须要重新创建一个TimeSeriesData类
TimeSeriesData series = new TimeSeriesData(t.getTimestamp(),t.getValue());
list.add(series);
}
Collections.sort(list);
for (TimeSeriesData t : list){
System.out.println(t);
}
return list;
}
public static Long dateToStamp(String s){
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
Date date = null;
try {
date = simpleDateFormat.parse(s);
} catch (ParseException e) {
e.printStackTrace();
}
return date.getTime();
}
public static String stampToDate(Long stamp){
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
Date date = new Date(stamp);
return simpleDateFormat.format(date);
}
}
SortInMemory_MovingAverageDriver:
package simpleMoving.MR;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* @Author rk
* @Date 2018/9/17 14:25
* @Description:
**/
public class SortInMemory_MovingAverageDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// conf.set("fs.defaultFS","hdfs://ran:9000")
// System.setProperty("HADOOP_USER_NAME","ran")
Job job = Job.getInstance(conf);
job.setJarByClass(SortInMemory_MovingAverageDriver.class);
job.setMapperClass(SortInMemory_MovingAverageMapper.class);
job.setReducerClass(SortInMemory_MovingAverageReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(TimeSeriesData.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
FileInputFormat.setInputPaths(job,inputPath);
FileSystem fs = FileSystem.get(conf);
if(fs.exists(outputPath)){
fs.delete(outputPath,true);
}
FileOutputFormat.setOutputPath(job,outputPath);
boolean isDone = job.waitForCompletion(true);
System.exit(isDone ? 0 : 1);
}
}
SortInMemory_MovingAverageMapper:
package simpleMoving.MR;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
/**
* @Author rk
* @Date 2018/9/17 11:50
* @Description:
**/
public class SortInMemory_MovingAverageMapper extends Mapper<LongWritable, Text,Text,TimeSeriesData> {
/**
*
* @param key
* @param value <name-as-string><,><timestamp><,><value-as-double>
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
TimeSeriesData timeSeries = DateUtil.getTimeSeriesData(value.toString(),",");
String name = value.toString().split(",")[0];
context.write(new Text(name),timeSeries);
}
}
SortInMemory_MovingAverageReducer:
package simpleMoving.MR;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
/**
* @Author rk
* @Date 2018/9/17 12:12
* @Description:
**/
public class SortInMemory_MovingAverageReducer extends Reducer<Text,TimeSeriesData,Text,Text> {
private int windowSize = 2; // 默认
/**
* 任务开始调用一次
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void setup(Context context) throws IOException, InterruptedException {
Configuration conf = new Configuration();
// windowSize = Integer.parseInt(conf.get("moving.average.window.size"));
}
/**
*
* @param key <name-as-string>
* @param values List<TimeSeriesData> TimeSeriesData (timestamp, value)
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void reduce(Text key, Iterable<TimeSeriesData> values, Context context) throws IOException, InterruptedException {
List<TimeSeriesData> sortedTimeSeries = DateUtil.sort(values);
//调用movingAverage(sortedTimeSeries, windowSize)并发输出
//对sortedTimeSeries应用移动平均算法
double sum = 0.0;
//计算前缀和
for(int i=0; i < windowSize-1; i++){
sum += sortedTimeSeries.get(i).getValue();
long timestamp = sortedTimeSeries.get(i).getTimestamp();
String date = DateUtil.stampToDate(timestamp);
//在数据不足时,移动平均
Text outputValue = new Text(date + "---" + sum);
context.write(key,outputValue);
}
//现在有足够的时间序列数据来计算移动平均
for(int i = windowSize-1; i < sortedTimeSeries.size(); i++){
sum += sortedTimeSeries.get(i).getValue();
double movingAverage = sum / windowSize;
long timestamp = sortedTimeSeries.get(i).getTimestamp();
String date = DateUtil.stampToDate(timestamp);
System.out.println(date);
Text outputValue = new Text(date + "," + movingAverage);
//准备下一次迭代
sum -= sortedTimeSeries.get(i-windowSize+1).getValue();
context.write(key,outputValue);
}
}
}
数据:
GOOG,2004-11-04,184.70
GOOG,2004-11-03,191.67
GOOG,2004-11-01,194.87
AAPL,2013-10-09,486.59
AAPL,2013-10-08,480.94
AAPL,2013-10-07,487.75
AAPL,2013-10-04,483.03
AAPL,2013-10-03,483.41
GOOG,2013-07-19,896.60
GOOG,2013-07-18,910.68
GOOG,2013-07-17,918.55
结果:
AAPL 2013-10-03---483.41
AAPL 2013-10-04,483.22
AAPL 2013-10-07,485.39
AAPL 2013-10-08,484.345
AAPL 2013-10-09,483.765
GOOG 2004-11-01---194.87
GOOG 2004-11-03,193.26999999999998
GOOG 2004-11-04,188.18499999999997
GOOG 2013-07-17,551.625
GOOG 2013-07-18,914.615
GOOG 2013-07-19,903.6400000000001
方案2:使用MapReduce框架排序
待续。。。