一个标准的大数据hadoop的mapredunce标准代码的开发

最新推荐文章于 2021-11-10 23:10:41 发布

Mr_FutureFinal

最新推荐文章于 2021-11-10 23:10:41 发布

阅读量294

点赞数

本文链接：https://blog.csdn.net/qq_42482484/article/details/81140317

版权

mapreduce的运算机制以及标准代码开发:

运行机制三部分 :

开发一个mapreduce的步骤:

开头 : extends Configured implements Tool,重写run方法

1. 开发一个类的继承mapper类,重写map方法

2, 开发一个类来继承reduce类

以下是在run方法中进行的细节

3. 声明一个job,并且获取这个job

4 . 加载这个类/ 或者加载jar包 : 加载类打包到linux上去运行; 加载jar包本地远程操作linux

5. 加载mapper的子类/并加载key跟value

6. 加载reduce的子类并加载key跟value

7. 开始job,

以下是具体的细节代码:

package Day03_Hadoop;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Demo02_WordCountMR extends Configured implements Tool {

    public int run(String[] args) throws Exception {

        // 6. 链接hadoop,声明job

        Configuration conf = getConf();
        if(args.length !=2) {
            System.out.println("usr: file num error");
            return -1;    
        }

        
        //7: 声明这个job ,并且命名
        
        Job job = Job.getInstance(conf,"字符统计");
        
        //9 : job从哪里加载这个类,hadoop jar 来加载的类是从这里面来加载的  俩种方法,1 .本地运行,在运行的时候会远程将jar包扔到服务器上去; setJar  2. 打包jar包上传到linux上去  
        //job.setJarByClass(Demo02_WordCountMR.class);
        
        
        job.setJar("./target/hadoop-2.7.6-0.0.1-SNAPSHOT.jar");
       
        //8: 设置mapper的输出(key跟value输出到reduce上)
        job.setMapperClass(WordCountMapper.class);;
        
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        
        //9: 设置 Reduce的输出
        
        job.setReducerClass(WordCountReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
    
        //10: 设置读取的文件
        
        //是这个类读取文件的,这个类有一个子类TestInputFormat,专门读取文本文件的
        //返回值是一个LongWritable,这个值作为map读取的key值,
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
        
        
        //10 开始新的任务
        
        boolean bool = job.waitForCompletion(true);
        return bool?0:1;
    
        
    }
         
    public static void main(String[] args) throws Exception {
        int code = ToolRunner.run(new Demo02_WordCountMR(), args);
        System.exit(code);
    }
    
    //1. 开发mapper类
    
    public static class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
        
    //2. 重写map方法
        
        @Override
        public void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            // 这里的key跟value是mapper类输入的前半部分,mapper的后半部分是输出部分
            //  .key部分是我不要的部分,这就是wordcount的计算机制,输入部分都是读文件的一行,并且key是行号,从0开始
            // 因此只是对value部分进行处理即可
            
            //3 .处理value部分  根据空格进行分开
            
            String str = value.toString();
            String[] strs = str.split("\\s+");  //正则表达式,空格
            
            //遍历输出每一个数据
            for (String s : strs) {
                
            //4 .context方法是用来向reduce部分进行输出的部分, 是mapper的后半部分, text intwritable
                
                context.write(new Text(s),new IntWritable(1));
                
            }
            
        }
        
    }
    
    //5 .开发reduce 部分
    
    public static class WordCountReduce extends Reducer<Text,IntWritable,Text, IntWritable>{
        
        //6 .重写reduce方法
        @Override
        public void reduce(Text key3, Iterable<IntWritable> value3,
                Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            
            //声明int,对数组进行求和;
            int sum = 0;
            for (IntWritable iw : value3) {
                sum += iw.get();
            }
            
            context.write(key3, new IntWritable(sum));
        }
        
        
    }    
    
}

Mr_FutureFinal

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
一个标准的大数据hadoop的mapredunce标准代码的开发

mapreduce的运算机制以及标准代码开发: 运行机制三部分 : 开发一个mapreduce的步骤: 开头 : extends Configured implements Tool,重写run方法 1. 开发一个类的继承mapper类,重写map...
复制链接

扫一扫