hive中count(*)、count(1)、count(某字段)的区别

最近看到某公司面试题有这个,顺便查了一下。

count(*):所有行进行统计,包括NULL行
count(1):所有行进行统计,包括NULL行
count(column):对column中非Null进行统计

我在集群找了一个表试了一下,结果差距不是很大,因为执行时间会受集群资源的影响,所以看下具体的执行步骤

先看结果,count(*)执行时间26s

select count(*)
from ods.baseorganize
MR程序:

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: baseorganize
            Statistics: Num rows: 1 Data size: 18670704 Basic stats: COMPLETE Column stats: COMPLETE
            Select Operator
              Statistics: Num rows: 1 Data size: 18670704 Basic stats: COMPLETE Column stats: COMPLETE
              Group By Operator
                aggregations: count()  <span style="color:#ff0000;">此处不同</span>
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

count(1)执行时间18.8s

select count(1) from ods.baseorganize 
MR程序

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
 
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: baseorganize
            Statistics: Num rows: 1 Data size: 18670704 Basic stats: COMPLETE Column stats: COMPLETE
            Select Operator
              Statistics: Num rows: 1 Data size: 18670704 Basic stats: COMPLETE Column stats: COMPLETE
              Group By Operator
                aggregations: count(1)  此处不同
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

count(id)执行时间19.25s

select count(id) from ods.baseorganize
MR程序

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
 
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: baseorganize
            Statistics: Num rows: 186707 Data size: 18670704 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id (type: string)
              outputColumnNames: id    这里多一行代码
              Statistics: Num rows: 186707 Data size: 18670704 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: count(id)  这里不同
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

从上面三个执行计划来看,小量不同:

1、count(id)的时候多了一个Select operator;
2、count(id)的Group By Operator 的 aggregations过程略有不同
--------------------- 
作者:huobumingbai1234 
来源:CSDN 
原文:https://blog.csdn.net/huobumingbai1234/article/details/80706011 
版权声明:本文为博主原创文章,转载请附上博文链接!

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值