MapReduce的应用程序开发

最新推荐文章于 2024-08-10 16:30:06 发布

fql123455

最新推荐文章于 2024-08-10 16:30:06 发布

阅读量585

点赞数

分类专栏： Hadoop 文章标签： MapReduce 应用程序单词计数

本文链接：https://blog.csdn.net/fql123455/article/details/99683853

版权

Hadoop 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

1.环境搭建

新建Maven工程，导入相关依赖

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>2.6.0</version>
</dependency>

2.开发MapReduce应用程序

1.单词计数程序

MapReduce应用程序的两个阶段：
1： Mapper:: 将大任务拆分成若干个小任务，将非结构化的数据映射为KV结构数据
2： Reduce：负责计算统计

1.准备样例文件

I say to you today, my friend.
And so even though we face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream.
I have a dream that one day this nation will rise up and live out the true meaning of its creed: “We hold these truths to be self-evident, that all men are created equal.”
I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.
I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.
I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.
I have a dream today!
I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of “interposition” and “nullification” – one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.
I have a dream today!
I have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight; “and the glory of the Lord shall be revealed and all flesh shall see it together.”?
This is our hope, and this is the faith that I go back to the South with.
With this faith, we will be able to hew out of the mountain of despair a stone of hope. With this faith, we will be able to transform the jarring discords of our nation into a beautiful symphony of brotherhood. With this faith, we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day.
And this will be the day – this will be the day when all of God’s children will be able to sing with new meaning:
My country 'tis of thee, sweet land of liberty, of thee I sing.
Land where my fathers died, land of the Pilgrim’s pride,
From every mountainside, let freedom ring!
And if America is to be a great nation, this must become true.

2.将模拟数据上传到HDFS中

在这里插入图片描述

3.定义Mapper任务

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * @aurhor:fql
 * @date 2019/8/14 15:30
 * @type:
 */
/**
 * writable 表示的Hadoop提供的序列化对象
 * KeyIn:LongWritable 每行数据的首字符offset(偏移量)
 * valuIn:Text 一行记录
 * KeyOut: Text 单词
 * keyout :text 单词
 * valueOut :IntWritable 初始值1
 */
public class MyMaper extends Mapper<LongWritable, Text,Text, IntWritable> {
    /**
     *
     * @param key  KeyIn
     * @param value valueIn
     * @param context 上下文信息
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] words = line.toLowerCase().split(" "); //依据空格分隔
        for (String word : words) {
            //输出处理完成KV数据
            context.write(new Text(word),new IntWritable(1));
        }
    }
}

4.定义Reduce任务

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.io.Text;
import java.io.IOException;
import java.util.Iterator;

/**
 * @aurhor:fql
 * @date 2019/8/14 15:30
 * @type:
 */
public class MyReduce extends Reducer<Text, IntWritable,Text,IntWritable> {

    /**
     *
     * @param key 单词
     * @param values key相同的初始值的集合
     * @param context 上下文对象
     * @throws IOException
     * @throws InterruptedException
     */

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count =0;
        Iterator<IntWritable> iterator = values.iterator();
        while (iterator.hasNext()){
            int num =iterator.next().get();
            count+=num;
        }
        //计算完成后 输出结算结果
        context.write(key,new IntWritable(count));
    }
}

5.定义初始化类

package com.al.mapreduces;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.io.Text;
import java.io.IOException;

/**
 * @aurhor:fql
 * @date 2019/8/14 15:30
 * @type: 单词计数的初始化类
 */
public class CountWordApplication {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //1.创建MapReduce任务对象
        Configuration conf = new Configuration();
        String jobName="wordcount";
        Job job = Job.getInstance(conf,jobName);
        job.setJarByClass(CountWordApplication.class);

        //2.设置计算数据的输入格式和计算结果的输出格式（文本）
          job.setInputFormatClass(TextInputFormat.class);
          job.setOutputFormatClass(TextOutputFormat.class);
        //3.指定计算数据的来源位置以及计算结果的输出位置
          TextInputFormat.addInputPath(job,new Path("/dream/dream.txt"));
          //输出目录必须为空
        TextOutputFormat.setOutputPath(job,new Path("/dream/result"));
        //4.指定MapReduce应用的Mapper阶段各Reduce的实现类
          job.setMapperClass(MyMaper.class);
          job.setReducerClass(MyReduce.class);
        //5.设置Mapper阶段和Reduce阶段的KeyOut和ValueOut的类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //6.任务提交
        job.waitForCompletion(true);// true输出运行日志
    }
}