通过简单例子了解partition分区类的重写方法
分区是在MR的过程中进行的,属于Shuffle阶段
但是在Job端不要忘记进行调用:job.setPartitionerClass(xxx.class)
按照年龄分区:
class AgePartitioner extends Partitioner<MyComparable, NullWritable> {
@Override
public int getPartition(MyComparable key, NullWritable value, int numPartitions) {
int partition = 0;
switch (key.age) {
case 22:
partition = 1;
break;
case 23:
partition = 2;
break;
case 24:
partition = 3;
break;
}
return partition;
}
}
按照数据倾斜分区:
// 自定义分区:在Map阶段给key加上随机后缀,基于后缀返回不同的分区编号
class SkewPartitioner extends Partitioner<Text, IntWritable> {
@Override
public int getPartition(Text text, IntWritable intWritable, int numPartitions) {
String key = text.toString();
int partitions = 0;
// 只对数据倾斜的key做特殊处理
if ("hadoop".equals(key.split("_")[0])) {
switch (key) {
// case "hadoop_0":
// partitions = 0;
// break;
case "hadoop_1":
partitions = 1;
break;
case "hadoop_2":
partitions = 2;
break;
}
} else {
// 正常的key还是按照默认的Hash取余进行分区
partitions = (key.hashCode() & Integer.MAX_VALUE) % numPartitions;
}
return partitions;
}
}