java mapreduce编程_MapReduce编程实战 PDF 下载

爱吃饭的小曾

于 2021-02-24 05:18:23 发布

阅读量154

点赞数

文章标签： java mapreduce编程

本文链接：https://blog.csdn.net/weixin_29174141/article/details/114541305

版权

本文档详细介绍了使用Java MapReduce计算学生平均成绩的编程实践。Map阶段处理输入文本文件，将每行学生的姓名和成绩拆分为键值对；Reduce阶段接收Map的输出，计算每个学生所有成绩的平均值。程序代码展示了如何实现Mapper和Reducer类，以及相关配置和输入输出格式的设置。

摘要由CSDN通过智能技术生成

主要内容：

3.2 设计思路

计算学生平均成绩是一个仿"WordCount"例子，用来重温一下开发MapReduce程序的流程。程序包括两部分的内容：Map部分和Reduce部分，分别实现了map和reduce的功能。

Map处理的是一个纯文本文件，文件中存放的数据时每一行表示一个学生的姓名和他相应一科成绩。Mapper处理的数据是由InputFormat分解过的数据集，其中InputFormat的作用是将数据集切割成小数据集InputSplit，每一个InputSlit将由一个Mapper负责处理。此外，InputFormat中还提供了一个RecordReader的实现，并将一个InputSplit解析成对提供给了map函数。InputFormat的默认值是TextInputFormat，它针对文本文件，按行将文本切割成InputSlit，并用LineRecordReader将InputSplit解析成对，key是行在文本中的位置，value是文件中的一行。

Map的结果会通过partion分发到Reducer，Reducer做完Reduce操作后，将通过以格式OutputFormat输出。

Mapper最终处理的结果对，会送到Reducer中进行合并，合并的时候，有相同key的键/值对则送到同一个Reducer上。Reducer是所有用户定制Reducer类地基础，它的输入是key和这个key对应的所有value的一个迭代器，同时还有Reducer的上下文。Reduce的结果由Reducer.Context的write方法输出到文件中。

3.3 程序代码

程序代码如下所示：

package com.hebut.mr;

import java.io.IOException;

import java.util.Iterator;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class Score {

public static class Map extends Mapper {

// 实现map函数

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

// 将输入的纯文本文件的数据转化成String

String line = value.toString();

// 将输入的数据首先按行进行分割

StringTokenizer tokenizerArticle = new StringTokenizer(line, "\n");

// 分别对每一行进行处理

while (tokenizerArticle.hasMoreElements()) {

// 每行按空格划分

StringTokenizer tokenizerLine = new StringTokenizer(tokenizerArticle.nextToken());

String strName = tokenizerLine.nextToken();// 学生姓名部分

String strScore = tokenizerLine.nextToken();// 成绩部分

Text name = new Text(strName);

int scoreInt = Integer.parseInt(strScore);

// 输出姓名和成绩

context.write(name, new IntWritable(scoreInt));

}

public static class Reduce extends

Reducer {

// 实现reduce函数

public void reduce(Text key, Iterable values,

Context context) throws IOException, InterruptedException {

int sum = 0;

int count = 0;

Iterator iterator = values.iterator();

while (iterator.hasNext()) {

sum += iterator.next().get();// 计算总分

count++;// 统计总的科目数

}

int average = (int) sum / count;// 计算平均成绩

context.write(key, new IntWritable(average));

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

// 这句话很关键

conf.set("mapred.job.tracker", "192.168.1.2:9001");

String[] ioArgs = new String[] { "score_in", "score_out" };

String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();

if (otherArgs.length != 2) {

System.err.println("Usage: Score Average ");

System.exit(2);

}

Job job = new Job(conf, "Score Average");

爱吃饭的小曾

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫