hive count(distinct ) 和group by 在count的区别

我们直接用explain查看下执行计划
select count(distinct remote_addr) uv from ods_weblog_visit where datastr = ‘20181101’;
±---------------------------------------------------±-+
| Explain |
±---------------------------------------------------±-+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: ods_weblog_visit //表 |
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Select Operator //select操作 |
| expressions: remote_addr (type: string) //查询的字段 |
| outputColumnNames: remote_addr //输出字段|
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator //gruopby操作 |
| aggregations: count(DISTINCT remote_addr) 聚合函数进行聚合 |
| keys: remote_addr (type: string) 输入的key |
| mode: hash |
| outputColumnNames: _col0, _col1 输出|
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator 给reduce的输出 |
| key expressions: _col0 (type: string) |
| sort order: + |
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Reduce Operator Tree: |
| Group By Operator //reduce继续对输入的数据进行groupby操作 |
| aggregations: count(DISTINCT KEY._col0:0._col0) 输入的数据执行聚合函数聚合|
| mode: mergepartial |
| outputColumnNames: _col0 输出 |
| Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 //没有limit |
| Processor Tree: |
| ListSink |
| |
±---------------------------------------------------±-+

select count(1) from (select remote_addr from ods_weblog_visit group by remote_addr) a ;
±---------------------------------------------------±-+
| Explain |
±---------------------------------------------------±-+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-2 depends on stages: Stage-1 |
| Stage-0 depends on stages: Stage-2 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: ods_weblog_visit //表 |
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Select Operator select操作 |
| expressions: remote_addr (type: string) select的字段 |
| outputColumnNames: remote_addr 输出字段 |
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator groupby操作 |
| keys: remote_addr (type: string) 根据remote_addr groupby |
| mode: hash 操作 |
| outputColumnNames: _col0 输出 _col0为临时数据 |
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: string) |
| sort order: + |
| Map-reduce partition columns: _col0 (type: string) |
| Statistics: Num rows: 1727 Data size: 172771 Basic stats: COMPLETE Column stats: NONE |
| Reduce Operator Tree: reduce阶段 |
| Group By Operator 继续groupby |
| keys: KEY._col0 (type: string) |
| mode: mergepartial 根据key合并 |
| outputColumnNames: _col0 |
| Statistics: Num rows: 863 Data size: 86335 Basic stats: COMPLETE Column stats: NONE |
| Select Operator Select操作 |
| Statistics: Num rows: 863 Data size: 86335 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator Groupby操作 |
| aggregations: count(1) 做一次聚合 |
| mode: hash hash方式 |
| outputColumnNames: _col0 输出 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe |
| |
| Stage: Stage-2 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| Reduce Output Operator |
| sort order: |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col0 (type: bigint) |
| Reduce Operator Tree: |
| Group By Operator groupby操作 |
| aggregations: count(VALUE._col0) 执行聚合函数 |
| mode: mergepartial 合并操作 |
| outputColumnNames: _col0 输出 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
±---------------------------------------------------±-+

详细可以参考https://www.cnblogs.com/cxzdy/p/5116222.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值