MapReduce编程实例4

最新推荐文章于 2024-11-11 20:35:54 发布

メイ

最新推荐文章于 2024-11-11 20:35:54 发布

阅读量126

点赞数

文章标签：大数据 java

原文链接：http://www.cnblogs.com/gaochsh/p/7802590.html

版权

MapReduce编程实例：

MapReduce编程实例（一）,详细介绍在集成环境中运行第一个MapReduce程序 WordCount及代码分析

MapReduce编程实例（二），计算学生平均成绩

MapReduce编程实例（三），数据去重

MapReduce编程实例（四），排序

MapReduce编程实例（五），MapReduce实现单表关联

MapReduce编程实例（六），MapReduce实现多表关联

排序，比较简单，上代码，代码中有注释，欢迎交流。

总体是利用MapReduce本身对Key进行排序的特性和按key值有序的分配到不同的partition。Mapreduce默认会对每个reduce按text类型key按字母顺序排序，对intwritable类型按大小进行排序。

[java] view plain copy

1. package com.t.hadoop;
3. import java.io.IOException;
5. import org.apache.hadoop.conf.Configuration;
6. import org.apache.hadoop.fs.Path;
7. import org.apache.hadoop.io.IntWritable;
8. import org.apache.hadoop.io.Text;
9. import org.apache.hadoop.mapreduce.Job;
10. import org.apache.hadoop.mapreduce.Mapper;
11. import org.apache.hadoop.mapreduce.Partitioner;
12. import org.apache.hadoop.mapreduce.Reducer;
13. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
14. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
15. import org.apache.hadoop.util.GenericOptionsParser;
17. /**
18. * 排序
19. * 利用MapReduce默认的对Key进行排序
20. * 继承Partitioner类，重写getPartition使Mapper结果整体有序分到相应的Partition，输入到Reduce分别排序。
21. * 利用全局变量统计位置
22. * @author daT dev.tao@gmail.com
23. *
24. */
25. public class Sort {
26. public static class SortMapper extends Mapper<Object, Text, IntWritable, IntWritable>{
28. //直接输出key,value，key为需要排序的值，value任意
29. @Override
30. protected void map(Object key, Text value,
31. Context context)throws IOException, InterruptedException {
32. System.out.println("Key: "+key+" "+"Value: "+value);
33. context.write(new IntWritable(Integer.valueOf(value.toString())),new IntWritable(1));
35. }
36. }
38. public static class SortReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable>{
39. public static IntWritable lineNum = new IntWritable(1);//记录该数据的位置
41. //查询value的个数，有多少个就输出多少个Key值。
42. @Override
43. protected void reduce(IntWritable key, Iterable<IntWritable> value,
44. Context context) throws IOException, InterruptedException {
46. System.out.println("lineNum: "+lineNum);
48. for(IntWritable i:value){
49. context.write(lineNum, key);
50. }
51. lineNum = new IntWritable(lineNum.get()+1);
52. }
53. }
56. public static class SortPartitioner extends Partitioner<IntWritable, IntWritable>{
58. //根据key对数据进行分派
59. @Override
60. public int getPartition(IntWritable key, IntWritable value, int partitionNum) {
61. System.out.println("partitionNum: "+partitionNum);
62. int maxnum = 23492;//输入的最大值，自己定义的。mapreduce 自带的有采样算法和partition的实现可以用，此例没有用。
63. int bound = maxnum/partitionNum;
64. int keyNum = key.get();
65. for(int i=0;i<partitionNum;i++){
66. if(keyNum>bound*i&&keyNum<=bound*(i+1)){
67. return i;
68. }
69. }
70. return -1;
71. }
73. }
76. public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{
77. Configuration conf = new Configuration();
78. String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
80. if(otherArgs.length<2){
81. System.out.println("input parameters errors");
82. System.exit(2);
83. }
85. Job job= new Job(conf);
86. job.setJarByClass(Sort.class);
87. job.setMapperClass(SortMapper.class);
88. job.setPartitionerClass(SortPartitioner.class);//此例不需要combiner，需要设置Partitioner
89. job.setReducerClass(SortReducer.class);
90. job.setOutputKeyClass(IntWritable.class);
91. job.setOutputValueClass(IntWritable.class);
93. FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
94. FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
96. System.exit(job.waitForCompletion(true)?0:1);
97. }
99. }

转载于:https://www.cnblogs.com/gaochsh/p/7802590.html

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。