在Hadoop的MapReduce过程中,每个map task处理完数据后,如果存在自定义Combiner类,会先进行一次本地的reduce操作,然后把数据发送到Partitioner,由Partitioner来决定每条记录应该送往哪个reducer节点,默认使用的是HashPartitioner,其核心代码如下:
public class HashPartitioner<K, V> extends Partitioner<K, V> {
/** Use {@link Object#hashCode()} to partition. */
public int getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}
上面的getPartition函数的作用:
1、获取key的哈希值
2、使用key的哈希值对reduce任务数求模
3、这样做的目的是可以把(key,value)对均匀的分发到各个对应编号的reduce task节点上,达到reduce task节点的负载均衡。
上面的代码只是实现了(key,value)键值对的均匀分布,但是无法实现如下需求:
1、假设输入的数据文件有4个,里面包含各个部门各个季度的销售额
2、使用mapreduce程序进行统计各个部门全年销售额,同时每个部门对应一个输出文件
自定义分区:
1)继承抽象类Partitioner,重写getPartition方法,
2)任务设置分区:job.setPartitionerClass()
3)自定义分区的数量需要和reduce task的数量保持一致。
代码实现
1、准备数据
[hadoop@hadoop1 ~]$ cat jidu1.txt
研发部门 100
测试部门 90
硬件部门 92
销售部门 200
[hadoop@hadoop1 ~]$ cat jidu2.txt
研发部门 200
测试部门 93
硬件部门 95
销售部门 230
[hadoop@hadoop1 ~]$ cat jidu3.txt
研发部门 202
测试部门 92
硬件部门 94
销售部门 231
[hadoop@hadoop1 ~]$ cat jidu4.txt
研发部门 209
测试部门 98
硬件部门 99
销售部门 251
[hadoop@hadoop1 ~]$
2、上传到hdfs上
[hadoop@hadoop1 ~]$ hdfs dfs -put jidu1.txt /jidu/input
18/06/08 19:45:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop1 ~]$ hdfs dfs -put jidu2.txt /jidu/input
18/06/08 19:45:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop1 ~]$ hdfs dfs -put jidu3.txt /jidu/input
18/06/08 19:45:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop1 ~]$ hdfs dfs -put jidu4.txt /jidu/input
18/06/08 19:46:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop1 ~]$ hdfs dfs -ls /jidu/input
18/06/08 19:46:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
-rw-r--r-- 1 hadoop supergroup 66 2018-06-08 19:45 /jidu/input/jidu1.txt
-rw-r--r-- 1 hadoop supergroup 66 2018-06-08 19:45 /jidu/input/jidu2.txt
-rw-r--r-- 1 hadoop supergroup 66 2018-06-08 19:45 /jidu/input/jidu3.txt
-rw-r--r-- 1 hadoop supergroup 66 2018-06-08 19:46 /jidu/input/jidu4.txt
3、编写mapreduce程序
JiduMapper.java:
package com.demo;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class JiduMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
String line=value.toString();
String[] ss=line.split("\t");
context.write(new Text(ss[0]), new IntWritable(Integer.parseInt(ss[1])));
}
}
JiduReducer.java:
package com.demo;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class JiduReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
// TODO Auto-generated method stub
int sum=0;
for(IntWritable value:values)
{
sum=sum+value.get();
}
context.write(key, new IntWritable(sum));
}
}
JiduPartitioner.java:
package com.demo;
import org.apache.hadoop.mapreduce.Partitioner;
public class JiduPartitioner<K, V> extends Partitioner<K, V>{
@Override
//自定义partition的数量需要和reduce task数量保持一致
public int getPartition(K key, V value, int numPartitions) {
// TODO Auto-generated method stub
String dname=key.toString();
switch(dname)
{
case "研发部门":return 0;
case "测试部门":return 1;
case "硬件部门":return 2;
case "销售部门":return 3;
}
return 4;
}
}
JiduRunner.java:
package com.demo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class JiduRunner {
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
Configuration conf=new Configuration();
Job job=Job.getInstance(conf);
job.setJarByClass(JiduRunner.class);
job.setMapperClass(JiduMapper.class);
job.setReducerClass(JiduReducer.class);
job.setCombinerClass(JiduReducer.class);
job.setPartitionerClass(JiduPartitioner.class);
job.setNumReduceTasks(4);//设置reduce task数量为4
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, new Path("hdfs://192.168.16.2:9000/jidu/input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.16.2:9000/jidu/output"));
System.exit(job.waitForCompletion(true)?0:1);
}
}
输出结果:
[hadoop@hadoop1 ~]$ hdfs dfs -ls /jidu/output
18/06/08 20:59:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 5 items
-rw-r--r-- 3 hadoop supergroup 0 2018-06-08 20:56 /jidu/output/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 17 2018-06-08 20:56 /jidu/output/part-r-00000
-rw-r--r-- 3 hadoop supergroup 17 2018-06-08 20:56 /jidu/output/part-r-00001
-rw-r--r-- 3 hadoop supergroup 17 2018-06-08 20:56 /jidu/output/part-r-00002
-rw-r--r-- 3 hadoop supergroup 17 2018-06-08 20:56 /jidu/output/part-r-00003
[hadoop@hadoop1 ~]$ hdfs dfs -cat /jidu/output/part-r-00000
18/06/08 20:59:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
研发部门 711
[hadoop@hadoop1 ~]$ hdfs dfs -cat /jidu/output/part-r-00002
18/06/08 20:59:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
硬件部门 380
[hadoop@hadoop1 ~]$ hdfs dfs -cat /jidu/output/part-r-00001
18/06/08 21:00:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
测试部门 373
[hadoop@hadoop1 ~]$ hdfs dfs -cat /jidu/output/part-r-00003
18/06/08 21:00:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
销售部门 912
[hadoop@hadoop1 ~]$
OK,到此自定义partition就完成了。
原文:https://blog.csdn.net/wo198711203217/article/details/80621738