要求每个省份手机号输出的文件中按照总流量内部排序。
1、分析
基于MapReduce流量汇总程序案例三,增加自定义分区类即可。
2、案例实操
(1)增加自定义分区类
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class FlowSortPartitioner extends Partitioner<FlowBean, Text> {
@Override
public int getPartition(FlowBean key, Text value, int numPartitions) {
int partition = 0;
String preNum = value.toString().substring(0, 3);
if (" ".equals(preNum)) {
partition = 5;
} else {
if ("136".equals(preNum)) {
partition = 1;
} else if ("137".equals(preNum)) {
partition = 2;
} else if ("138".equals(preNum)) {
partition = 3;
} else if ("139".equals(preNum)) {
partition = 4;
}
}
return partition;
}
}
(2)在驱动类中添加分区类
job.setPartitionerClass(FlowSortPartitioner.class);
job.setNumReduceTasks(5);
3、Code -> GitHub