hadoop的分组

最新推荐文章于 2020-08-26 09:10:53 发布

Anald

最新推荐文章于 2020-08-26 09:10:53 发布

阅读量706

点赞数

分类专栏： Hadoop 文章标签： hadoop

本文链接：https://blog.csdn.net/u010503822/article/details/78347901

版权

Hadoop 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1 创建分区类

public class AreaPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE> {

    private static HashMap<String, Integer> areaMap = new HashMap<String, Integer>();   
    static {
        areaMap.put("136", 0);
        areaMap.put("137", 1);
        areaMap.put("138", 2);
        areaMap.put("139", 3);
    }
    //这里接受到Reducer context.write(key, value), 在这里进行分区输出key和value进行分组输出到相应的分组文件中
    @Override
    public int getPartition(KEY key, VALUE value, int numPartitions) {
        FlowSortBean bean = (FlowSortBean)key;

        Integer partitionNum = 4;
        if(bean != null && StringUtils.isNotBlank(bean.getPhoneNB())) {         
            String phoneNB = bean.getPhoneNB();         
            Integer prefix = areaMap.get(phoneNB.substring(0, 3));
            partitionNum = prefix != null ? prefix : partitionNum;

        }
        return partitionNum;        
    }

}

2 在启动类中调用分组

    //设置分区接收到Reducer write输出的值，进行分区输出
    job.setPartitionerClass(AreaPartitioner.class);
    //设置了Reducer的分区数目，比如5 YarnChild Instance
    //这里将分成5个文件，索引值： 0-4
    job.setNumReduceTasks(5);//process (or instance)

注意：
1.这里的reduce task 数量，要和AreaPartitioner类返回的getPartition()返回的不同值的数量保持一致
2.如果reducer task数量比patitioner中分组的数量多，会产生空文件。
比如:0-4是有值的，其他都是空文件
3.如果reducer task数量比patitioner中分组的数量少，会产生io异常。
因为有一些key没有对应reducer接收分组文件
reduce task或map task指的是reducer和mapper所在的集群中运行的实例

Anald

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
hadoop的分组

1.创建分区类public class AreaPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE> { private static HashMap<String, Integer> areaMap = new HashMap<String, Integer>(); static { areaMap.pu
复制链接

扫一扫