Hadoop之旅(5)— MapReduce Java API案例实战

1、MapReduce 案例

        本文实现 MapReduce 用 Java API 实现统计单词案例、代码优化及其 MapReduce 编程格式;这也是面试经典笔试代码题。代码就是  “ MapReduce 八股文 ”,我们只需修改部分代码。

Hadoop之旅(4)— MapReduce 与 YARN 原理讲解  、 Hadoop之旅(1)—单机与伪集群安装、简单经典案例

准备好环境:idea、创建项目、添加依赖;


A:依赖

 <!-- Hadoop Client 依赖-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>


B:启动 MapReduce



2、代码实现 WordCountMapReduce.class


2.1、map 阶段 — 代码有一定的格式、只需修改部分代码

 /**
     * step 1: Map Class
     * <p>
     * map 输入 ——> map 输出
     * public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
     */
    public static class WordCountMapper extends
            Mapper<LongWritable, Text, Text, IntWritable> {


        private Text mapOutputKey = new Text();
        private final static IntWritable mapOuputValue = new  IntWritable(1);

        /**
         * Called once for each key/value pair in the input split. Most applications
         * should override this, but the default is the identity function.
         *
         * @param key
         * @param value
         * @param context
         */
        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            //super.map(key, value, context);
            // 获取值
            String lineValue = value.toString();

            // split 占内存 ——> 分割字符串
            // String[] strs = lineValue.split(" ");
            StringTokenizer stringTokenizer = new StringTokenizer(lineValue);

            // iterator
            while(stringTokenizer.hasMoreTokens()){
                // get word value
                String wordValue = stringTokenizer.nextToken();
                // set value
                mapOutputKey.set(wordValue);;
                // output  ——> 上下文操作
                context.write(mapOutputKey, mapOuputValue);
            }
        }
    }


2.2、Reduce 阶段

/**
     * step 2: Reduce Class
     * reduce 输入 ——> reduce 输出
     * public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
     */
    public static class WordCountReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {

        private IntWritable outputValue = new  IntWritable();

        /**
         * This method is called once for each key. Most applications will define
         * their reduce class by overriding this method. The default implementation
         * is an identity function.
         *
         * @param key
         * @param values
         * @param context
         */
        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            //super.reduce(key, values, context);

            // sum tmp
            int sum= 0 ;
            // iterator
            for(IntWritable value: values){
                // total
                sum += value.get();
            }
            // set value
            outputValue.set(sum);

            // output
            context.write(key, outputValue);
        }
    }


2.3、封装map和reduce

 // step 3: Driver ,component job
    // 封装map和reduce
    public int run(String[] args) throws Exception {
        // 1: get confifuration   ——> extends Configured ——> new Configuration();
        Configuration configuration = getConf();

        // 2: create Job
        Job job = Job.getInstance(configuration, this.getClass().getSimpleName());

        // run jar
        job.setJarByClass(this.getClass());
        
        // 3: set job
        // input  -> map  -> reduce -> output
        // 3.1: input ——> 参数(数据源)
        Path ipath = new Path(args[0]);
        FileInputFormat.addInputPath(job,ipath);

        // 3.2: map
        job.setMapperClass(WordCountMapper.class);
        // 设置输出类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // 3.3: reduce
        job.setReducerClass(WordCountReducer.class);
        // 设置输出类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 3.4: output ——> 参数(输出源)
        Path outPath = new Path(args[1]);
        FileOutputFormat.setOutputPath(job, outPath);

        // 4: submit job ——> job 提交
        boolean isSuccess = job.waitForCompletion(true);
        return isSuccess ? 0 :1;
    }


2.4、运行

// step 4: run program
    public static void main(String[] args) throws Exception {
        // 1: get confifuration
        Configuration configuration = new Configuration();

        //int status = new WordCountMapReduce().run(args);

        // ——> extends Configured ——> implements Tool
        int status = ToolRunner.run(configuration,//
                new WordCountMapReduce(),//
                args);

        System.exit(status);

    }

这里需要注意的是、用到ToolRunner类,所以我们的 WordCountMapReduce 类要写成这样:

public class WordCountMapReduce extends Configured implements Tool{


2.5、程序打包



2.6、运行 jar 

数据源:/chenzhengyou/mapreduce/wordcount/input/idea.input

输出源:/chenzhengyou/mapreduce/wordcount/output/test01


运行MapReduce:[root@czy-1 hadoop-2.5.0]# 

bin/hadoop jar  /usr/local/chenzhengyou/hadoop/standalone/hadoop-2.5.0/jars/hadoop-mapreduce.jar /chenzhengyou/mapreduce/wordcount/input/idea.input /chenzhengyou/mapreduce/wordcount/output/test01



查看结果:bin/hdfs dfs -text /chenzhengyou/mapreduce/wordcount/output/test01/par*





阅读更多

扫码向博主提问

chenzhengyou天道酬勤

非学,无以致疑;非问,无以广识
  • 擅长领域:
  • java
  • mysql
  • redis
去开通我的Chat快问
版权声明:本文为博主原创文章,可允许转载,但注明出处。 https://blog.csdn.net/JavaWebRookie/article/details/73655753
所属专栏: Hadoop之旅
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭