【Hadoop】之Mapreduce

最新推荐文章于 2024-08-14 09:31:31 发布

kaikaihit

最新推荐文章于 2024-08-14 09:31:31 发布

阅读量243

点赞数

文章标签： hadoop

本文链接：https://blog.csdn.net/zkzbhh/article/details/78997589

版权

云计算与大数据专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1、Mapreduce简介
　1)Mapreduce是一种分布式计算模型，是google提出的，主要用于搜索领域，解决海量数据的计算问题。
　2)Mapreduce有两个阶段：map和reduce，需要实现map()和reduce()两个函数，即可实现分布式计算。

２.、mapreduce执行流程

这里写图片描述

３、mapreduce执行原理

这里写图片描述

４、执行步骤
　1)Map任务处理
　　①读取HDFS中的文件。每一行解析成一个速度发(key,value）每一个键值对调用一次map函数，例如分解成(0，hadoop yarn)、(11，hdfs mapreduce)等。
　　②map()接受①中产生的（key,value）,进行处理，转换为新的(key,value)输出，例如产生（hadoop,1）、(yarn，1)、(hdfs，1)、(mapreduce，1)、(hadoop，1)
　　③shuffle进行排序分组。排序按照key值的英文字母顺序排序；分组指的是将相同key的value放在一个集合中。例如排序后产生(hadoop，1)、(hadoop，1)、(hdfs，1)、(mapreduce，1)、(yarn，1)。分组后产生(hadoop，{1,1}),(hdfs，1),(mapreduce，1),(yarn，1).

2)Reduce任务处理
　　①多个map输出，按照不同的分区，通过网络copy到不同的reduce节点上。
　　②对多个map的输出进行合并，排序。覆盖reduce函数，接受的是分组后的数据，实现自己的业务逻辑。例如reduce后的结果(hadoop，2)，(hdfs，1)，(mapreduce，1),(yarn，1)，处理后，产生新的value.
　　③将reduce输出的(key,value)写到HDFS中。

4、代码实现

public class MapReduceOper{
    //创建一个静态内部类ModeMapper，并继承Mapper
    public class ModelMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
        //重写map()方法
        public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
            String strValue = value.toString();
            StringTokenizer stringTokenizer = new StringTokenizer(strValue);
            while (stringTokenizer.hasMoreTokens()){
                String str = stringTokenizer.nextToken();
                Text outKeyValue = new Text(str);
                context.write(outKeyValue,new LongWritable(1));
            }
        }
    //创建一个静态的内部类ModelReducer,并继承Reducer
    public class ModelReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
        //重写reduce()方法
        public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException{
            //Super.reduce(key,values,context)；
            int sum = 0;
for (IntWritable value:values) {
sum = sum + value.get();
}
context.write(key,new IntWritable(sum));
         }
    }

    //创建一个run()方法，用于描述任务过程。
public int run(String[] args) throws Exception{
   //连接HDFS文件系统
        Configuration configuration=new Configuration();
        configuration.set("fs.defaultFS","hdfs://hadoop:9000");
        //生成Job对象
        Job job=Job.getInstance(configuration,this.getClass().getSimpleName());
        job.setJarByClass(this.getClass());
        //input-> map -> reduce -> output

        //设置input
        Path path = new Path(args[2]);
        FileInputFormat.addInputPath(job,path);

        //设置map
        job.setMapperClass(ModelMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        //设置reduce
        job.setReducerClass(ModelReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //设置output
        Path outPath = new Path(args[3]);
        FileOutputFormat.setOutputPath(job,outPath);

        //提交任务
        boolean isSucc = job.waitForCompletion(true);
        return isSucc? 0 : 1;
}
 }
    
//编写main函数进行测试    
public static void main(String[] args) throws Exception{
       System.out.println(new MapReduceOper().run(args));
}

kaikaihit

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Hadoop】之Mapreduce

1、Mapreduce简介　1)Mapreduce是一种分布式计算模型，是google提出的，主要用于搜索领域，解决海量数据的计算问题。　2)Mapreduce有两个阶段：map和reduce，需要实现map()和reduce()两个函数，即可实现分布式计算。２.、mapreduce执行流程３、mapreduce执行原理４、执行步骤　1)Ma
复制链接

扫一扫

专栏目录