移动平均算法的实现方法

要解决移动平均问题,提供两个简单Java对象解决方案:

解决方案1:使用java.util.Queue

package simpleMoving;
/**
 * SimpleMovingAverage
 * 使用队列实现POJO移动平均
 */

import java.util.LinkedList;
import java.util.Queue;

public class SimpleMovingAverage {

    private double sum = 0.0;
    private int period;
    private final Queue<Double> window = new LinkedList<Double>();

    public SimpleMovingAverage(int period) {
        if(period < 1){
            throw new IllegalArgumentException("period must be > 0");
        }
        this.period = period;
    }

    public void addNewNumber(double number){
        sum += number;
        window.add(number);
        if(window.size() > period){
            sum -= window.remove();
        }
    }

    public double getMovingAverage(){
        if(window.isEmpty()){
            throw new IllegalArgumentException("average is undefined");
        }
        return sum / window.size();
    }


}

解决方案2: 使用数组模拟队列

package simpleMoving;

/**
 *SimpleMovingAverageUsingArray
 * 使用数组实现POJO移动平均
 */

public class SimpleMovingAverageUsingArray {

    private double sum = 0.0;
    private int period;
    private double[] window = null;
    private int pointer = 0;
    private int size = 0;

    public SimpleMovingAverageUsingArray(int period){
        if(period < 1){
            throw new IllegalArgumentException("period must be > 0");
        }
        this.period = period;
        window = new double[period];
    }

    public void addNewNumber(double number){
        sum += number;
        if(size < period){
            window[pointer++] = number;
            size ++;
        }else{
            pointer = pointer % period;
            sum -= window[pointer];
            window[pointer++] = number;
        }
    }

    public double getMovingAverage(){
        if (size == 0){
            throw new IllegalArgumentException("average is undefined");
        }
        return sum / size;
    }


}

测试移动平均算法:

package simpleMoving;

import java.util.logging.Logger;

public class TestSimpleMovingAverage {
    private static final Logger THE_LOGGER =
            Logger.getLogger("TestSimpleMovingAverage");

    public static void main(String[] args) {
        double[] testData = {10, 18, 20, 30, 24, 33, 27};
        int[] allWindowSizes = {3,4};

        //使用队列
        for (int windowSize : allWindowSizes){
            SimpleMovingAverage sma = new SimpleMovingAverage(windowSize);
            THE_LOGGER.info("winSize = " + windowSize);
            for(double x : testData){
                sma.addNewNumber(x);
                THE_LOGGER.info("Next number = " + x + ", SMA = "+ sma.getMovingAverage());
            }
            THE_LOGGER.info("---");
        }

        THE_LOGGER.info("----------------------------------------------");

        //使用数组
        for (int windowSize : allWindowSizes){
            SimpleMovingAverageUsingArray sma1 = new SimpleMovingAverageUsingArray(windowSize);
            THE_LOGGER.info("winSize = " + windowSize);
            for(double x : testData){
                sma1.addNewNumber(x);
                THE_LOGGER.info("Next number = " + x + ", SMA = "+ sma1.getMovingAverage());
            }
            THE_LOGGER.info("---");

        }


    }

}

使用MapReduce/Hadoop实现移动平均解决方案:

方案1:在内存中排序:

Hadoop实现类:

Hadoop移动平均实现中的类
类名描述
SortInMemory_MovingAverageDriver提交Hadoop作业的驱动器
SortInMemory_MovingAverageMapper定义map()
SortInMemory_MovingAverageReducer定义reduce()
TimeSeriesData将时间序列数据点表示为一个(timestamp,double)对
DateUtil提供基本数据转换工具

 

 

 

 

 

 

 

 

实现代码如下:

TimeSeriesData:

package simpleMoving.MR;

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
  * @class_name TimeSeriesData
 * @author rk
 * @date 2018/9/17 11:36
  * @Description: <name-as-string><,><timestamp><,><value-as-double>
 */
public class TimeSeriesData implements WritableComparable<TimeSeriesData> {
    private long timestamp;
    private double value;
/*    public static TimeSeriesData copy(TimeSeriesData tsd){
        return new TimeSeriesData(tsd.timestamp, tsd.value);
    }*/
    public TimeSeriesData(long timestamp, double value){
        this.timestamp = timestamp;
        this.value = value;
    }

    public TimeSeriesData() {
    }

    public long getTimestamp() {
        return timestamp;
    }

    public double getValue() {
        return value;
    }

    @Override
    public String toString() {
        return "TimeSeriesData{" +
                "timestamp=" + timestamp +
                ", value=" + value +
                '}';
    }

    public void setTimestamp(long timestamp) {
        this.timestamp = timestamp;
    }

    public void setValue(double value) {
        this.value = value;
    }

    public int compareTo(TimeSeriesData o) {

        long diff = o.getTimestamp() - this.getTimestamp();
        if(diff == 0){
            return 0;
        }else{
            return diff < 0 ? 1 : -1;
        }
    }

    public void write(DataOutput out) throws IOException {
        out.writeLong(timestamp);
        out.writeDouble(value);
    }

    public void readFields(DataInput in) throws IOException {
        this.timestamp = in.readLong();
        this.value = in.readDouble();
    }


}

DateUtil:

package simpleMoving.MR;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.*;

/**
 * @Author rk
 * @Date 2018/9/17 12:05
 * @Description:
 **/
public class DateUtil {

    public static TimeSeriesData  getTimeSeriesData(String value, String field) {
        String[] splits = value.split(field);
        Long ts = dateToStamp(splits[1]);
        return new TimeSeriesData(ts, Double.parseDouble(splits[2]));

    }

    public static List<TimeSeriesData>  sort(Iterable<TimeSeriesData> value){
        List<TimeSeriesData> list = new ArrayList<TimeSeriesData>();
        for (TimeSeriesData t : value){
            //必须要重新创建一个TimeSeriesData类
            TimeSeriesData series = new TimeSeriesData(t.getTimestamp(),t.getValue());
            list.add(series);
        }
        Collections.sort(list);
        for (TimeSeriesData t : list){
            System.out.println(t);
        }
        return list;
    }

    public static Long dateToStamp(String s){
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
        Date date = null;
        try {
            date = simpleDateFormat.parse(s);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        return date.getTime();
    }

    public static String stampToDate(Long stamp){
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
        Date date = new Date(stamp);
       return simpleDateFormat.format(date);
    }


}

SortInMemory_MovingAverageDriver:

package simpleMoving.MR;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * @Author rk
 * @Date 2018/9/17 14:25
 * @Description:
 **/
public class SortInMemory_MovingAverageDriver {
    public static void main(String[] args) throws Exception  {
        Configuration conf = new Configuration();

//    conf.set("fs.defaultFS","hdfs://ran:9000")

//    System.setProperty("HADOOP_USER_NAME","ran")

        Job job = Job.getInstance(conf);

        job.setJarByClass(SortInMemory_MovingAverageDriver.class);

        job.setMapperClass(SortInMemory_MovingAverageMapper.class);
        job.setReducerClass(SortInMemory_MovingAverageReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(TimeSeriesData.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);


        Path inputPath = new Path(args[0]);
        Path outputPath = new Path(args[1]);
        FileInputFormat.setInputPaths(job,inputPath);
        FileSystem fs = FileSystem.get(conf);

        if(fs.exists(outputPath)){
            fs.delete(outputPath,true);
        }
        FileOutputFormat.setOutputPath(job,outputPath);

        boolean isDone = job.waitForCompletion(true);

        System.exit(isDone ? 0 : 1);


    }

}

SortInMemory_MovingAverageMapper:

package simpleMoving.MR;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * @Author rk
 * @Date 2018/9/17 11:50
 * @Description:
 **/

public class SortInMemory_MovingAverageMapper extends Mapper<LongWritable, Text,Text,TimeSeriesData> {
    /**
     *
     * @param key
     * @param value  <name-as-string><,><timestamp><,><value-as-double>
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        TimeSeriesData timeSeries = DateUtil.getTimeSeriesData(value.toString(),",");
        String name = value.toString().split(",")[0];
        context.write(new Text(name),timeSeries);
    }
}

SortInMemory_MovingAverageReducer:

package simpleMoving.MR;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;

/**
 * @Author rk
 * @Date 2018/9/17 12:12
 * @Description:
 **/
public class  SortInMemory_MovingAverageReducer extends Reducer<Text,TimeSeriesData,Text,Text> {
    private int windowSize = 2; // 默认

    /**
     * 任务开始调用一次
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
         Configuration conf = new Configuration();
//        windowSize = Integer.parseInt(conf.get("moving.average.window.size"));
    }

    /**
     *
     * @param key <name-as-string>
     * @param values List<TimeSeriesData> TimeSeriesData (timestamp, value)
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void reduce(Text key, Iterable<TimeSeriesData> values, Context context) throws IOException, InterruptedException {
        List<TimeSeriesData> sortedTimeSeries = DateUtil.sort(values);
        //调用movingAverage(sortedTimeSeries, windowSize)并发输出
        //对sortedTimeSeries应用移动平均算法
        double sum = 0.0;
        //计算前缀和
        for(int i=0; i < windowSize-1; i++){
            sum += sortedTimeSeries.get(i).getValue();
            long timestamp = sortedTimeSeries.get(i).getTimestamp();
            String date = DateUtil.stampToDate(timestamp);
            //在数据不足时,移动平均
            Text outputValue = new Text(date + "---" + sum);
            context.write(key,outputValue);
        }
        //现在有足够的时间序列数据来计算移动平均
        for(int i = windowSize-1; i < sortedTimeSeries.size(); i++){
            sum += sortedTimeSeries.get(i).getValue();
            double movingAverage = sum / windowSize;
            long timestamp = sortedTimeSeries.get(i).getTimestamp();
            String date = DateUtil.stampToDate(timestamp);
            System.out.println(date);
            Text outputValue = new Text(date + "," + movingAverage);

            //准备下一次迭代
            sum -= sortedTimeSeries.get(i-windowSize+1).getValue();

            context.write(key,outputValue);

        }

    }


}

数据:

GOOG,2004-11-04,184.70
GOOG,2004-11-03,191.67
GOOG,2004-11-01,194.87
AAPL,2013-10-09,486.59
AAPL,2013-10-08,480.94
AAPL,2013-10-07,487.75
AAPL,2013-10-04,483.03
AAPL,2013-10-03,483.41
GOOG,2013-07-19,896.60
GOOG,2013-07-18,910.68
GOOG,2013-07-17,918.55

结果:

AAPL	2013-10-03---483.41
AAPL	2013-10-04,483.22
AAPL	2013-10-07,485.39
AAPL	2013-10-08,484.345
AAPL	2013-10-09,483.765
GOOG	2004-11-01---194.87
GOOG	2004-11-03,193.26999999999998
GOOG	2004-11-04,188.18499999999997
GOOG	2013-07-17,551.625
GOOG	2013-07-18,914.615
GOOG	2013-07-19,903.6400000000001

方案2:使用MapReduce框架排序

待续。。。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

R_记忆犹新

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值