空分区对hive执行的影响

如果一个分区表中有很多的空的分区(在hdfs上看仅仅一个空文件夹),那么对这样的分区表进行访问的时候,空的分区会带来性能的影响.

我建了个表,做了测试,测试在有空分区和没有空分区的情况下,有什么区别.

一,不包含空分区

hive> SELECT count(*)
> FROM sunwg_02
> WHERE status=’enabled’
> and hp_dw_end_date > ’2012-09-03′
> and dw_begin_date <= '2012-09-03'
> and dw_end_date > ’2012-09-03′;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Cannot run job locally: Input Size (= 3213698494) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Hadoop job information for Stage-1: number of mappers: 56; number of reducers: 1
2012-09-10 13:27:05,945 Stage-1 map = 26%, reduce = 0%
2012-09-10 13:27:21,142 Stage-1 map = 87%, reduce = 0%
2012-09-10 13:27:39,132 Stage-1 map = 100%, reduce = 32%
Ended Job = job_201208241319_2390879
OK
14738812

二,包含空分区,空分区个数为55个
hive> SELECT count(*)
> FROM sunwg_02
> WHERE status=’enabled’
> and hp_dw_end_date > ’2012-09-03′
> and dw_begin_date <= '2012-09-03'
> and dw_end_date > ’2012-09-03′
> ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Cannot run job locally: Input Size (= 3213698494) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Hadoop job information for Stage-1: number of mappers: 111; number of reducers: 1
2012-09-10 13:31:44,510 Stage-1 map = 0%, reduce = 0%
2012-09-10 13:32:01,597 Stage-1 map = 81%, reduce = 0%
2012-09-10 13:32:19,292 Stage-1 map = 100%, reduce = 0%
Ended Job = job_201208241319_2391240
OK
14738812

说明:
1,有空分区的情况下,map个数要更多些,多的map数正好是空分区的个数
2,那些在空分区上的map任务执行情况如下:
-mr-10002/49/emptyFile:0+87 > sort 10-Sep-2012 13:31:34 10-Sep-2012 13:31:43 (9sec)
虽然是空的分区,但还是占了系统资源来执行
3,申请过多的map是需要时间的,执行map也是需要资源的

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值