hive中如何确定一个mapreduce作业的reduce数量

       版本:hive1.2.1

       看源码:org.apache.hadoop.hive.ql.exec.Utilities类中的estimateReducers方法

       参数1: totalInputFileSize     job的所有输入的总的字节数
       参数2: bytesPerReducer     每个reduce的数据量,由hive.exec.reducers.bytes.per.reducer参数指定,当前版本默认是256MB
       参数3: maxReducers          一个maprduce作业所允许的最大的reduce数量,由参数hive.exec.reducers.max指定,默认是1099
       参数4:  powersOfTwo            bucket相关的参数,默认是false

       


  public static int estimateReducers(long totalInputFileSize, long bytesPerReducer,
      int maxReducers, boolean powersOfTwo) {
    double bytes = Math.max(totalInputFileSize, bytesPerReducer);
    int reducers = (int) Math.ceil(bytes / bytesPerReducer);
    reducers = Math.max(1, reducers);
    reducers = Math.min(maxReducers, reducers);

    int reducersLog = (int)(Math.log(reducers) / Math.log(2)) + 1;
    int reducersPowerTwo = (int)Math.pow(2, reducersLog);

    if (powersOfTwo) {
      // If the original number of reducers was a power of two, use that
      if (reducersPowerTwo / 2 == reducers) {
        // nothing to do
      } else if (reducersPowerTwo > maxReducers) {
        // If the next power of two greater than the original number of reducers is greater
        // than the max number of reducers, use the preceding power of two, which is strictly
        // less than the original number of reducers and hence the max
        reducers = reducersPowerTwo / 2;
      } else {
        // Otherwise use the smallest power of two greater than the original number of reducers
        reducers = reducersPowerTwo;
      }
    }
    return reducers;
  }
       由这段代码可知,reduce的数量是min(max(totalInputFileSize/bytesPerReducer,1),maxReducers)来决定的。

     当然,也不是所有的mapreduce作业都会走这个计算reduce的流程,有些sql,比如order by操作,会使reduce数为1.


        

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值