MapReduce之线性回归

最新推荐文章于 2024-05-15 16:33:05 发布

路人张的鱼生

最新推荐文章于 2024-05-15 16:33:05 发布

阅读量1k

点赞数

分类专栏： MapReduce

本文链接：https://blog.csdn.net/zhangdy12307/article/details/106892421

版权

MapReduce 专栏收录该内容

41 篇文章 8 订阅

订阅专栏

MapReduce之线性回归

线性回归最主要的功能是描述变量间可能的关系。其中，最常用的形式就是最小二乘拟合。下面程序的思路主要是按照最小二乘法的思路展开。

样例数据

线性回归分析的目标是找出与数据拟合的线性方程，在找到这个方程后就可以对模型在一定程度上作出预测，在这里使用的两个变量是一个年龄和血糖水平，如下，最终拟合成线性方程 y=ax+b

病人编号	年龄(x)	血糖水平
1	41	90
2	42	93
3	43	98
4	20	64
5	25	78
6	40	71
7	58	88
8	60	86

接下来按照如下公式计算
$a=\frac{(\sum(y))(\sum(x^{2})-(\sum(x))(\sum(xy))}{n(\sum(x^{2}))-(\sum(x))^{2}}$
$b=\frac{n(\sum(xy))-(\sum(x))(\sum(y))}{n(\sum(x^{2}))-(\sum(x))^{2}}$
其中，n为回归的样本数
在接下来的过程中，可以通过调用SimpleRegression库来直接对线性回归进行计算，可以简化很大操作。

mapper阶段任务

该阶段主要是将获取所需计算的参数并传递给reducer阶段，因为在reducer阶段需要对所有的变量集合计算，因此在该阶段设置mapper的键为空保证所有的值落入到同一个reducer中

mapper阶段编码


public class linearMapper  extends Mapper<LongWritable,Text, NullWritable,Text> {
    private static Text ageAndsugarValue=new Text();

    public void map(LongWritable key,Text value,Context context){
        try{
            String[] line=value.toString().split(",");
            if(line==null||line.length==0){
                return ;
            }
            ageAndsugarValue.set(line[1]+','+line[2]);
            context.write(NullWritable.get(),ageAndsugarValue);
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ArrayIndexOutOfBoundsException e) {
            e.printStackTrace();
        }
    }
}

reducer阶段任务

该阶段主要是遍历所有的变量并通过SimpleRegression来计算相关参数。从而得出回归方程

reducer阶段编码


public class linearReducer extends Reducer<NullWritable,Text, NullWritable, Text> {

    public void reduce(NullWritable key,Iterable<Text> values,Context context){
        int numberOfValues=0;
        SimpleRegression sr=new SimpleRegression();
        for(Text value:values){
            String[] tokens=value.toString().split(",");
            sr.addData(Double.parseDouble(tokens[0]),
                    Double.parseDouble(tokens[1]));
            numberOfValues++;
        }
        try{
            context.write(NullWritable.get(),new Text("y = "+
                    sr.getSlope()+"x + "+sr.getIntercept()+
                    ",  p = "+sr.getSignificance()));
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

驱动程序如下


public class Driver {
    public static void main(String[] args){
        try{
            Configuration conf=new Configuration();
            String[] otherArgs=new String[]{"input/linear_regression.txt","output"};
            Job job=new Job(conf,"linear_regression");
            FileInputFormat.addInputPath(job,new Path(otherArgs[0]));
            FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));
            job.setJarByClass(Driver.class);
            job.setMapperClass(linearMapper.class);
            job.setReducerClass(linearReducer.class);
            job.setOutputKeyClass(NullWritable.class);
            job.setOutputValueClass(Text.class);
            System.exit(job.waitForCompletion(true)?0:1);
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }
    }
}

运行结果

在这里插入图片描述

让我们来检查以下

效果还可以

路人张的鱼生

关注

0
点赞
踩
17

收藏

觉得还不错? 一键收藏
14
评论
MapReduce之线性回归

MapReduce之线性回归线性回归最主要的功能是描述变量间可能的关系。其中，最常用的形式就是最小二乘拟合。下面程序的思路主要是按照最小二乘法的思路展开。样例数据线性回归分析的目标是找出与数据拟合的线性方程，在找到这个方程后就可以对模型在一定程度上作出预测，在这里使用的两个变量是一个年龄和血糖水平，如下，最终拟合成线性方程 y=ax+b病人编号年龄(x)血糖水平141902429334398420645257864071
复制链接

扫一扫