第一个mapreduce

最新推荐文章于 2022-08-29 08:18:34 发布

嗯Jeffrey

最新推荐文章于 2022-08-29 08:18:34 发布

阅读量535

点赞数

分类专栏： hadoop

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/u011917643/article/details/10825139

版权

hadoop 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

hadoop环境搭好了，那么就试着写第一个mapreduce吧，以<hadoop: the definitive guide>中按年统计最高温度为例。温度文件的具体格式参考该书目。

原作者提供了温度文件的sample.txt，有5行记录作为测试，上传服务器。

Code和书中基本一致，书中的例子使用的早期的JobClient API，这里改为Job，

1. 建立项目

在eclipse中新建一个java project，引入hadoop -core.jar.

建3个类mapper: MaxTemperatureMapper，reducer: MaxTemperatureReducer，主程序: MaxTemperature，然后打成jar包，上传到hadoop机器上即可。

在hadoop上运行：

hadoop jar ~/jars/maxTemperature.jar MaxTemperature ~/data/test/sample.txt output

附录：

MaxTemperatureMapper.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper extends Mapper<LongWritable,Text,Text,IntWritable>{

private static final int MISSING=9999;

public void map(LongWritable key,Text value,Context context)

throws IOException,InterruptedException{

String line=value.toString();

String year=line.substring(15,19);

int temperature;

if(line.charAt(87)=='+'){

temperature=Integer.parseInt(line.substring(87,92));

}else {

temperature=Integer.parseInt(line.substring(87,92));

}

String quality=line.substring(92,93);

//正则匹配

if(temperature!=MISSING && quality.matches("[01459]")){

context.write(new Text(year), new IntWritable(temperature));

}

}

}

MaxTemperatureReducer.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable>{

public void reduce(Text key,Iterable<IntWritable> values,Context context)

throws IOException,InterruptedException{

int maxTemp=Integer.MIN_VALUE;

for(IntWritable temp:values){

maxTemp=Math.max(maxTemp, temp.get());

}

context.write(key, new IntWritable(maxTemp));

}

}

MaxTemperature.java

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

public static void main(String[] args) throws Exception{

if(args.length!=2){

System.err.println("usage: MaxTemperature <input path> <output path>");

System.exit(-1);

}

long begainTime=System.currentTimeMillis();

Job job=new Job();

job.setJarByClass(MaxTemperature.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath( job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);

job.setReducerClass(MaxTemperatureReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

boolean status=job.waitForCompletion(true);

System.out.printf("runing time(ms) : %d",System.currentTimeMillis()-begainTime);

System.exit(status?0:1);

}

}

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

嗯Jeffrey CSDN认证博客专家 CSDN认证企业博客

码龄11年

3: 原创

96万+: 周排名

58万+: 总排名

2470: 访问

: 等级

56: 积分

0: 粉丝

0: 获赞

2: 评论

0: 收藏

私信

关注

热门文章

分类专栏

hadoop 2篇
DM 1篇
ML
Spring
Scala
Spark
JAVA

最新评论

hadoop在CentOS上的安装部署
嗯Jeffrey 回复 oO笨笨Oo: 希望对你能有所帮助
hadoop在CentOS上的安装部署
oO笨笨Oo: 太好了，CentOS上做的教程太少了，网上搜出来的全是Ubuntu上做的。大家都是抄来抄去。SSH那一块儿确实是有一些不同，待会儿再实验一下去。

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。