MapReduce之移动平均（以股票价格为例）

最新推荐文章于 2024-04-21 00:00:00 发布

路人张的鱼生

最新推荐文章于 2024-04-21 00:00:00 发布

阅读量1k

点赞数

分类专栏： MapReduce 文章标签： Mapreduce

本文链接：https://blog.csdn.net/zhangdy12307/article/details/92018669

版权

MapReduce 专栏收录该内容

41 篇文章 8 订阅

订阅专栏

MapReduce之移动平均（以股票价格为例）

基本概念

时间序列数据

时间序列数据表示一个变量在一段时间内的值

移动平均

令A为一组有序对象的序列：
$A=(a_1,a_2,a_3,...,a_N)$
可以把A表示为
${a_i \}_{i=1}^{N}$
n移动平均序列是由ai定义的一个新序列
${S_i\}_{i=1}^{N-n+1}$
通过计算n项子序列的算术平均值计算得到：
$S_i=\frac{1}{n}\sum_{j=i}^{i+n-1}a_j$

基本示例

股票收盘价时间序列数据

时间序列	日期	收盘价
1	2013-10-01	10
2	2013-10-02	18
3	2013-10-03	20
4	2013-10-04	30

股票收盘价3天的移动平均数

时间序列	日期	移动平均	如何计算
1	2013-10-01	10.00	=(10)/1
2	2013-10-02	14.00	(10+18)/2
3	2013-10-03	16.00	(10+18+20)/3
4	2013-10-04	22.66	(18+20+30)/3

MapReduce移动平均解决方案思路

样例输入

GOOG,2004-11-04,184.70
GOOG,2004-11-03,191.67
GOOG,2004-11-02,194.87
AAPL,2013-10-9,486.59
AAPL,2013-10-8,480.94
AAPL,2013-10-7,487.75
AAPL,2013-10-4,483.03
AAPL,2013-10-3,483.41
IBM,2013-09-30,185.18
IBM,2013-09-30,186.92
IBM,2013-09-30,190.22
IBM,2013-09-30,189.47
GOOG,2013-07-19,896.60
GOOG,2013-07-18,910.68
GOOG,2013-07-17,918.55

样例输入

AAPL	2013-10-03,483.41
AAPL	2013-10-04,483.22
AAPL	2013-10-07,484.73
AAPL	2013-10-08,483.7825
AAPL	2013-10-09,484.34400000000005
GOOG	2004-11-02,194.87
GOOG	2004-11-03,193.26999999999998
GOOG	2004-11-04,190.41333333333333
GOOG	2013-07-17,372.4475
GOOG	2013-07-18,480.09399999999994
GOOG	2013-07-19,620.4399999999999
IBM	2013-09-30,186.92
IBM	2013-09-30,188.57
IBM	2013-09-30,188.87
IBM	2013-09-30,187.9475

在了解移动平均算法后，只需要根据股票代码对数据分组，然后按照时间戳对这些值排序，最用应用移动平均算法。

基于数组模拟队列的移动平均解决方案如下

public class MovingAverage {

        private double sum = 0.0;
        private final int period;
        private double[] window = null;
        private int pointer = 0;
        private int size = 0;

        public MovingAverage(int period) {
            if (period < 1) {
                throw new IllegalArgumentException("period must be > 0");
            }
            this.period = period;
            window = new double[period];
        }

        public void addNewNumber(double number) {
            sum += number;
            if (size < period) {
                window[pointer++] = number;
                size++;
            }
            else {
                // size = period (size cannot be > period)
                pointer = pointer % period;
                sum -= window[pointer];
                window[pointer++] = number;
            }
        }

        public double getMovingAverage() {
            if (size == 0) {
                throw new IllegalArgumentException("average is undefined");
            }
            //
            return sum / size;
        }
    }

实现过程

在整个过程中为移动平均实现二次排序，
则映射器的输出键应当是自然键(name-as-string)和自然键(timeserise-timestamp)的组合

将时间序列数据点表示为一个(timestamp,double)对

public static class TimeSeriesData implements WritableComparable<TimeSeriesData>{
        private long timestamp;
        private double value;
        public TimeSeriesData(){

        }

        public long getTimestamp() {
            return timestamp;
        }

        public void setTimestamp(long timestamp) {
            this.timestamp = timestamp;
        }

        public double getValue() {
            return value;
        }

        public void setValue(double value) {
            this.value = value;
        }

        public void set(long timestamp,double value){
            this.timestamp=timestamp;
            this.value=value;
        }

        @Override
        public String toString() {
            return "TimeSeriesData{" +
                    "timestamp=" + timestamp +
                    ", value=" + value +
                    '}';
        }

定义一个定制组合键(string,timestamp)

public static class CompositeKey implements WritableComparable<CompositeKey>{
        private String name;
        private long timestamp;
        public CompositeKey(){

        }
        public void set(String name,long timestamp){
            this.name=name;
            this.timestamp=timestamp;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public long getTimestamp() {
            return timestamp;
        }

        public void setTimestamp(long timestamp) {
            this.timestamp = timestamp;
        }

        public int compareTo(CompositeKey o) {
            if(this.name.compareTo(o.name)!=0){
                return this.name.compareTo(o.name);
            }else if(this.timestamp!=o.timestamp){
                return timestamp>o.timestamp?1:-1;
            }else{
                return 0;
            }
        }

        public void write(DataOutput dataOutput) throws IOException {
            dataOutput.writeUTF(this.name);
            dataOutput.writeLong(this.timestamp);
        }

        public void readFields(DataInput dataInput) throws IOException {
            this.name=dataInput.readUTF();
            this.timestamp=dataInput.readLong();
        }
    }

CompositeKey类会在shuffle阶段中根据字段“name”和“timestamp”完成排序，所以在接下来的过程中提供一个类来比较组合键对象，他的功能主要是提供compare()方法

定义compositeKey的排序顺序

public static class CompositeKeyComparator extends WritableComparator{
        protected CompositeKeyComparator(){
            super(CompositeKey.class,true);
        }
        public int compare(WritableComparable w1,WritableComparable w2){
            CompositeKey key1=(CompositeKey) w1;
            CompositeKey key2=(CompositeKey) w2;
            int comparsion=key1.getName().compareTo(key2.getName());
            if(comparsion==0){
                if(key1.getTimestamp()==key2.getTimestamp()){
                    return 0;
                }else {
                    return key1.getTimestamp()>key2.getTimestamp()?1:0;
                }
            }else{
                return comparsion;
            }
        }
    }

完成对组合键的排序后，通过实现Partitioner接口的NaturalKeyPartitioner类处理，将mapper生成的键空间分区

Partitoner阶段编码

public class NaturalKeyPartitioner extends Partitioner<CompositeKey, TimeSeriesData> {
        @Override
        public int getPartition(CompositeKey key, TimeSeriesData value,
                                int numberOfPartitions) {
            return Math.abs((int) (hash(key.getName()) % numberOfPartitions));
        }

        static long hash(String str) {
            long h = 1125899906842597L; // prime
            int length = str.length();
            for (int i = 0; i < length; i++) {
                h = 31 * h + str.charAt(i);
            }
            return h;
        }
    }

接下来使用插件类NaturalKeyGroupingComparator，在hadoop的shuffle阶段，将用这个类的按照键的自然键部分对组合键分组

GroupingComparator阶段编码

public static class NaturalKeyGroupingComparator extends WritableComparator {
        protected NaturalKeyGroupingComparator() {
            super(CompositeKey.class, true);
        }

        @Override
        public int compare(WritableComparable w1, WritableComparable w2) {
            CompositeKey key1 = (CompositeKey) w1;
            CompositeKey key2 = (CompositeKey) w2;
            return key1.getName().compareTo(key2.getName());
        }
    }

基本的日期转换工具

import java.text.SimpleDateFormat;
import java.util.Date;

public class DateUtil {

    static final String DATE_FORMAT = "yyyy-MM-dd";
    static final SimpleDateFormat SIMPLE_DATE_FORMAT =
            new SimpleDateFormat(DATE_FORMAT);

    public static Date getDate(String dateAsString)  {
        try {
            return SIMPLE_DATE_FORMAT.parse(dateAsString);
        }
        catch(Exception e) {
            return null;
        }
    }

    public static long getDateAsMilliSeconds(Date date) throws Exception {
        return date.getTime();
    }

    public static long getDateAsMilliSeconds(String dateAsString) throws Exception {
        Date date = getDate(dateAsString);
        return date.getTime();
    }

    public static String getDateAsString(long timestamp) {
        return SIMPLE_DATE_FORMAT.format(timestamp);
    }

}

mapper编码阶段

mapper阶段切割文本提取CompositeKey和TimeSeriesData生成<CompositeKey,TimeSeriesData>键值对

public static class MovingAverageMapper extends
            Mapper<LongWritable, Text, CompositeKey, TimeSeriesData> {
        private final CompositeKey reducerKey = new CompositeKey();
        private final TimeSeriesData reducerValue = new TimeSeriesData();

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            if ((line == null) || (line.length() == 0)) {
                return;
            }
            String[] tokens = line.split(",");
            if (tokens.length == 3) {
                Date date = DateUtil.getDate(tokens[1]);
                if (date == null) {
                    return;

                }
                long timestamp = date.getTime();
                reducerKey.set(tokens[0], timestamp);
                reducerValue.set(timestamp, Double.parseDouble(tokens[2]));
                context.write(reducerKey, reducerValue);
            }
        }
    }

reduce编码阶段

public static class MovingAverageReducer extends Reducer<CompositeKey, TimeSeriesData, Text, Text> {
        int windowSize = 5;

        protected void reduce(CompositeKey key, Iterable<TimeSeriesData> values,
                              Context context) throws IOException, InterruptedException {
            Text outputKey = new Text();
            Text outputValue = new Text();
            MovingAverage ma = new MovingAverage(this.windowSize);
            for (TimeSeriesData data : values) {
                ma.addNewNumber(data.getValue());
                Double movingAverage = ma.getMovingAverage();
                long timestamp = data.getTimestamp();
                String dateAsString = DateUtil.getDateAsString(timestamp);
                outputValue.set(dateAsString + "," + movingAverage);
                outputKey.set(key.getName());
                context.write(outputKey, outputValue);
            }
        }
    }

完整代码如下

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.util.Date;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.lang.InterruptedException;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;

public class simpleMovingAverage {
    public static class MovingAverage {

        private double sum = 0.0;
        private final int period;
        private double[] window = null;
        private int pointer = 0;
        private int size = 0;

        public MovingAverage(int period) {
            if (period < 1) {
                throw new IllegalArgumentException("period must be > 0");
            }
            this.period = period;
            window = new double[period];
        }

        public void addNewNumber(double number) {
            sum += number;
            if (size < period) {
                window[pointer++] = number;
                size++;
            }
            else {
                pointer = pointer % period;
                sum -= window[pointer];
                window[pointer++] = number;
            }
        }

        public double getMovingAverage() {
            if (size == 0) {
                throw new IllegalArgumentException("average is undefined");
            }
           
            return sum / size;
        }
    }
    public static class CompositeKey implements WritableComparable<CompositeKey>{
        private String name;
        private long timestamp;
        public CompositeKey(){

        }
        public void set(String name,long timestamp){
            this.name=name;
            this.timestamp=timestamp;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public long getTimestamp() {
            return timestamp;
        }

        public void setTimestamp(long timestamp) {
            this.timestamp = timestamp;
        }

        public int compareTo(CompositeKey o) {
            if(this.name.compareTo(o.name)!=0){
                return this.name.compareTo(o.name);
            }else if(this.timestamp!=o.timestamp){
                return timestamp>o.timestamp?1:-1;
            }else{
                return 0;
            }
        }

        public void write(DataOutput dataOutput) throws IOException {
            dataOutput.writeUTF(this.name);
            dataOutput.writeLong(this.timestamp);
        }

        public void readFields(DataInput dataInput) throws IOException {
            this.name=dataInput.readUTF();
            this.timestamp=dataInput.readLong();
        }
    }

    public static class TimeSeriesData implements WritableComparable<TimeSeriesData>{
        private long timestamp;
        private double value;
        public TimeSeriesData(){

        }

        public long getTimestamp() {
            return timestamp;
        }

        public void setTimestamp(long timestamp) {
            this.timestamp = timestamp;
        }

        public double getValue() {
            return value;
        }

        public void setValue(double value) {
            this.value = value;
        }

        public void set(long timestamp,double value){
            this.timestamp=timestamp;
            this.value=value;
        }

        @Override
        public String toString() {
            return "TimeSeriesData{" +
                    "timestamp=" + timestamp +
                    ", value=" + value +
                    '}';
        }

        public int compareTo(TimeSeriesData o) {
            if(this.timestamp<o.timestamp){
                return -1;
            }else if(this.timestamp>o.timestamp){
                return 1;
            }else{
                return 0;
            }
        }

        public void write(DataOutput dataOutput) throws IOException {
            dataOutput.writeLong(timestamp);
            dataOutput.writeDouble(value);
        }

        public void readFields(DataInput dataInput) throws IOException {
            this.timestamp=dataInput.readLong();
            this.value=dataInput.readDouble();
        }
    }

    public static class CompositeKeyComparator extends WritableComparator{
        protected CompositeKeyComparator(){
            super(CompositeKey.class,true);
        }
        public int compare(WritableComparable w1,WritableComparable w2){
            CompositeKey key1=(CompositeKey) w1;
            CompositeKey key2=(CompositeKey) w2;
            int comparsion=key1.getName().compareTo(key2.getName());
            if(comparsion==0){
                if(key1.getTimestamp()==key2.getTimestamp()){
                    return 0;
                }else {
                    return key1.getTimestamp()>key2.getTimestamp()?1:0;
                }
            }else{
                return comparsion;
            }
        }
    }
    public static class NaturalKeyPartitioner extends Partitioner<CompositeKey, TimeSeriesData> {
        @Override
        public int getPartition(CompositeKey key, TimeSeriesData value,
                                int numberOfPartitions) {
            return Math.abs((int) (hash(key.getName()) % numberOfPartitions));
        }

      
        static long hash(String str) {
            long h = 1125899906842597L; 
            int length = str.length();
            for (int i = 0; i < length; i++) {
                h = 31 * h + str.charAt(i);
            }
            return h;
        }
    }
    public static class NaturalKeyGroupingComparator extends WritableComparator {
        protected NaturalKeyGroupingComparator() {
            super(CompositeKey.class, true);
        }

        @Override
        public int compare(WritableComparable w1, WritableComparable w2) {
            CompositeKey key1 = (CompositeKey) w1;
            CompositeKey key2 = (CompositeKey) w2;
            return key1.getName().compareTo(key2.getName());
        }
    }

    public static class MovingAverageMapper extends
            Mapper<LongWritable, Text, CompositeKey, TimeSeriesData> {
        private final CompositeKey reducerKey = new CompositeKey();
        private final TimeSeriesData reducerValue = new TimeSeriesData();

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            if ((line == null) || (line.length() == 0)) {
                return;
            }
            String[] tokens = line.split(",");
            if (tokens.length == 3) {
                Date date = DateUtil.getDate(tokens[1]);
                if (date == null) {
                    return;

                }
                long timestamp = date.getTime();
                reducerKey.set(tokens[0], timestamp);
                reducerValue.set(timestamp, Double.parseDouble(tokens[2]));
                context.write(reducerKey, reducerValue);
            }
        }
    }
    public static class MovingAverageReducer extends Reducer<CompositeKey, TimeSeriesData, Text, Text> {
        int windowSize = 5;

        protected void reduce(CompositeKey key, Iterable<TimeSeriesData> values,
                              Context context) throws IOException, InterruptedException {
            Text outputKey = new Text();
            Text outputValue = new Text();
            MovingAverage ma = new MovingAverage(this.windowSize);
            for (TimeSeriesData data : values) {
                ma.addNewNumber(data.getValue());
                Double movingAverage = ma.getMovingAverage();
                long timestamp = data.getTimestamp();
                String dateAsString = DateUtil.getDateAsString(timestamp);
                outputValue.set(dateAsString + "," + movingAverage);
                outputKey.set(key.getName());
                context.write(outputKey, outputValue);
            }
        }
    }

    public static void main(String[] args) throws Exception {
        FileUtil.deleteDir("output");
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "simpleMovingAverage");
        job.setMapperClass(MovingAverageMapper.class);
        job.setReducerClass(MovingAverageReducer.class);
        job.setMapOutputKeyClass(CompositeKey.class);
        job.setMapOutputValueClass(TimeSeriesData.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setPartitionerClass(NaturalKeyPartitioner.class);
        job.setGroupingComparatorClass(NaturalKeyGroupingComparator.class);
        job.setSortComparatorClass(CompositeKeyComparator.class);
        job.setNumReduceTasks(1);
        FileInputFormat.setInputPaths(job, new Path("input/file.txt"));
        FileOutputFormat.setOutputPath(job, new Path("output"));
        System.exit(job.waitForCompletion(true)?0:1);
    }
}

路人张的鱼生

关注

0
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
MapReduce之移动平均（以股票价格为例）

MapReduce之移动平均（以股票价格为例）基本概念时间序列数据时间序列数据表示一个变量在一段时间内的值移动平均令A为一组有序对象的序列：A=(a1,a2,a3,...,aN)A=(a_1,a_2,a_3,...,a_N)A=(a1,a2,a3,...,aN)可以把A表示为{ai}i=1N\{a_i \}_{i=1}^{N}{ai}i=1Nn移动平均序列是由...
复制链接

扫一扫