第一个mapreduce程序——执行和详解

最新推荐文章于 2022-04-18 20:39:14 发布

不爱吃红萝卜

最新推荐文章于 2022-04-18 20:39:14 发布

阅读量338

点赞数

分类专栏： hadoop java

本文链接：https://blog.csdn.net/bingyu0046/article/details/46009683

版权

java 同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

hadoop

2 篇文章 0 订阅

订阅专栏

本文通过一个简单的MapReduce程序，演示如何在Hadoop环境中计算平均数。首先在Eclipse中创建Java工程，编写源代码并导出jar包。接着，在Hadoop命令行中运行jar包，指定输入和输出文件。程序执行后，结果将存储在output/out.txt的part-r-00000文件中。注意避免常见的命令行错误，如不正确的参数使用。此教程适合Hadoop MapReduce初学者参考。

摘要由CSDN通过智能技术生成

这是我参考的一篇文章，主要是代码部分，计算平均数的程序，相比于单词计数那个程序比较容易理解。

http://www.linuxidc.com/Linux/2014-03/98262.htm

在eclipse中新建java工程，新建java类，需要导入的包有：

源代码是：

package mapreduce;

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

/*
 * 计算学生课程平均成绩（某学生总分/课程数）
 * 输入格式
 *
 * 小明 语文 92
 * 小明 数学 88
 * 小明 英语 90
 * 小强 语文 76
 * 小强 数学 66
 * 小强 英语 80
 * 小木 语文 60
 * 小木 数学 65
 * 小木 英语 61
 *
 * 输出
 *
 * 小明 90
 * 小强 74
 * 小木 62
 */
public class Average {
	public static class AverMapper extends Mapper<Object, Text, Text, Text> {
		protected void map(Object key, Text value, Context context)	throws IOException, InterruptedException {
			String line = value.toString();
			StringTokenizer stringTokenizer = new StringTokenizer(line, "\n");
			String name = "";
			StringBuffer out = new StringBuffer(32);
			while (stringTokenizer.hasMoreElements()) {
				String tmp = stringTokenizer.nextToken();
				StringTokenizer st = new StringTokenizer(tmp);
				while (st.hasMoreElements()) {
					name = st.nextToken();
					out.append(st.nextToken());
					out.append("_");
					out.append(st.nextToken());
					// 使用默认的hash partitioner将名字相同的同学发到一个reduce上
					context.write(new Text(name), new Text(out.toString()));
				}
			}
		}

	}

	public static class AverReducer extends	Reducer<Text, Text, Text, FloatWritable> {
		@Override
		protected void reduce(Text key, Iterable<Text> values, Context context)	throws IOException, InterruptedException {
			Iterator<Text> it = values.iterator();
			// 计算每个key对应的记录条数和总分数
			int count = 0;
			int sum = 0;
			while (it.hasNext()) {
				String value = it.next().toString();
				String[] strs = value.split("_");
				if (strs.length < 2) {
					continue;
				}
				try {
					sum += Integer.parseInt(strs[1]);
				} catch (Exception e) {
					System.err.println(e.getMessage());
				}
				count++;
			}
			FloatWritable average = new FloatWritable(sum / count);
			context.write(key, average);
		}
	}

	
	public static void main(String[] args) throws IOException,InterruptedException,ClassNotFoundException {
		Configuration conf = new Configuration();//获得Configuration配置 Configuration: core-default.xml, core-site.xml 
		String[] arguments = new GenericOptionsParser(conf, args).getRemainingArgs();//获得输入参数 [hdfs://localhost:9000/user/dat/input, hdfs://localhost:9000/user/dat/output]    
		if (arguments.length < 2) {
			System.out.println("Usage:mapreduce.Average in out");
			System.exit(1);
		}
		Job job = Job.getInstance(conf, "Average");
		job.setJarByClass(Average.class);
		job.setMapperClass(AverMapper.class);
		job.setReducerClass(AverReducer.class);
		job.setMapOutputValueClass(Text.class);
		job.setMapOutputKeyClass(Text.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(FloatWritable.class);
		FileInputFormat.addInputPath(job, new Path(arguments[0]));//传入input path
		FileOutputFormat.setOutputPath(job, new Path(arguments[1]));//传入output path，输出路径应该为空，否则报错org.apache.hadoop.mapred.FileAlreadyExistsException
		System.exit(job.waitForCompletion(true) ? 0 : 1);//是否正常退出
	}

}

程序编写完成后导出为jar包，然后在Hadoop安装好的环境的命令行中输入：

hadoop jar average.jar mapreduce.Average input/read.txt output/out.txt

解释一下：

hadoop jar 执行命令

average.jar 是你打包好的jar包

mapreduce.Average 是jar包中的类

input/read.txt 是输入文件，这个路径自己设置，只要路径写对没有其它影响，这个文件中存储的是：

<span style="font-size:12px;">小明 语文 92
小明 数学 88
小明 英语 90
小强 语文 76
小强 数学 66
小强 英语 80
小木 语文 60
小木 数学 65
小木 英语 61</span>

output/out.txt 是存储输出的文件，这个文件不用创建，只要写出路径即可。

命令执行完成后在out.txt文件中出生成下面两个文件

在part-r-00000中的数据是

小明 90.0
小强 74.0
小木 62.0

此处输入输出对应于程序中的：

FileInputFormat.addInputPath(job, new Path(arguments[0]));
FileOutputFormat.setOutputPath(job, new Path(arguments[1]));

命令行这块调试了好久才知道怎么写，之前一直输出结果是：

Begin.....

Usage:mapreduce.Average in out

对于网上很多人所说的eclipse中的Hadoop运行插件，只是为了方便而已，不一定非要设置，保持命令行的习惯还是很好的。

可以参考这篇文章来理解mapreduce

http://www.jiacheo.org/blog/233

不爱吃红萝卜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录