count及leftjoin中的谓语下推

最新推荐文章于 2023-02-21 16:06:54 发布

平平无奇小码农

最新推荐文章于 2023-02-21 16:06:54 发布

阅读量676

点赞数

分类专栏：笔记文章标签： sql 数据库

本文链接：https://blog.csdn.net/weixin_46669856/article/details/122928159

版权

笔记专栏收录该内容

31 篇文章 5 订阅

订阅专栏

-- 时间相关
标准时间格式字符：yyyy 表示年 必须小写
MM 表示月 必须大写，小写表示月份
dd 表示天 必须小写
HH/hh 表示小时 大小写均可
mm 表示分钟，必须小写，大写表示月份
ss 表示秒 必须小写 大写表示毫秒
SS 表示毫秒 必须大写 小写表示秒

一、count相关

1、count(1) and count(*)

当表的数据量大些时，对表作分析之后，使用count(1)还要比使用count(*)用时多了！

从执行计划来看，count(1)和count()的效果是一样的。但是在表做过分析之后，count(1)会比count()的用时少些（1w以内数据量），不过差不了多少。

如果count(1)是聚索引,id,那肯定是count(1)快。但是差的很小的。

因为count(),自动会优化指定到那一个字段。所以没必要去count(1)，用count()，sql会帮你完成优化的因此：count(1)和count(*)基本没有差别！

2、count(1) and count(字段)

两者的主要区别是

count(1) 会统计表中的所有的记录数，包含字段为null 的记录。
count(字段) 会统计该字段在表中出现的次数，忽略字段为null 的情况。即不统计字段为null 的记录。

转自：http://www.cnblogs.com/Dhouse/p/6734837.html

3、count(*) 和 count(1)和count(列名)区别

执行效果上：

count(*)包括了所有的列，相当于行数，在统计结果的时候，不会忽略列值为NULL
count(1)包括了忽略所有列，用1代表代码行，在统计结果的时候，不会忽略列值为NULL
count(列名)只包括列名那一列，在统计结果的时候，会忽略列值为空（这里的空不是只空字符串或者0，而是表示null）的计数，即某个字段值为NULL时，不统计。

执行效率上：

列名为主键，count(列名)会比count(1)快
列名不为主键，count(1)会比count(列名)快
如果表多个列并且没有主键，则 count（1）的执行效率优于 count（*）
如果有主键，则 select count（主键）的执行效率是最优的
如果表只有一个字段，则 select count（*）最优。

转自：http://eeeewwwqq.iteye.com/blog/1972576

4、实例分析

mysql> create table counttest(name char(1), age char(2));
Query OK, 0 rows affected (0.03 sec)
 
mysql> insert into counttest values
    -> ('a', '14'),('a', '15'), ('a', '15'),
    -> ('b', NULL), ('b', '16'),
    -> ('c', '17'),
    -> ('d', null),
    ->('e', '');
Query OK, 8 rows affected (0.01 sec)
Records: 8  Duplicates: 0  Warnings: 0
 
mysql> select * from counttest;
+------+------+
| name | age  |
+------+------+
| a    | 14   |
| a    | 15   |
| a    | 15   |
| b    | NULL |
| b    | 16   |
| c    | 17   |
| d    | NULL |
| e    |      |
+------+------+
8 rows in set (0.00 sec)
mysql> select name, count(name), count(1), count(*), count(age), count(distinct(age))
    -> from counttest
    -> group by name;
+------+-------------+----------+----------+------------+----------------------+
| name | count(name) | count(1) | count(*) | count(age) | count(distinct(age)) |
+------+-------------+----------+----------+------------+----------------------+
| a    |           3 |        3 |        3 |          3 |                    2 |
| b    |           2 |        2 |        2 |          1 |                    1 |
| c    |           1 |        1 |        1 |          1 |                    1 |
| d    |           1 |        1 |        1 |          0 |                    0 |
| e    |           1 |        1 |        1 |          1 |                    1 |
+------+-------------+----------+----------+------------+----------------------+
5 rows in set (0.00 sec)

额外参考资料：

http://blog.csdn.net/lihuarongaini/article/details/68485838

来源：https://blog.csdn.net/iFuMI/article/details/77920767

二、谓词下推

官网参考：https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior
cbo参考：https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive

谓词下推:就是在join之前的 mr 任务的 map 阶段提前对表进行过滤优化，使得最后参与 join 的表的数据量更小

2.1、首先定义了join的四种类型的表

1、Preserved Row table：
    保留表，通俗地说就是 
        a left (outer) join b   中的 a 表;
        a right (outer) join b  中的 b 表;
        a full outer join b     a 和 b 表都是 保留表。
2、Null Supplying table：
    提供null值的表，也就是 非保留表，在join中如果不能匹配上的使用null填充的表
        a left (outer) join b   中的 b 表; 
        a right (outer) join b  中的 a 表，
        a full outer join b     a 和 b 表都是 null值保留表
3、During Join predicate：
    join中谓词，就是on后面的条件:  R1 join R2 on R1.x = 5   --> R1.x = 5就是join中谓词
4、After Join predicate：
    join后谓词，就是where后面的条件： a left join b where a.t=1  --> a.t=1 就是join后谓词

！！！测试时，关闭 cbo 优化：set hive.cbo.enable=false

2.2、Left outer Join & Right outer Join

案例一：过滤条件写在 where, 且是 保留表的字段     --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select o.id from bigtable b
left join bigtable o 
where b.id <= 10;


案例二：过滤条件写在 where, 且是 非保留表的字段   --> 不可以谓词下推（能在 map 阶段提前过滤）
！！！注意： 关闭cbo去测 set hive.cbo.enable=false，如果开启了cbo，会自动优化成可以谓词下推;
explain
select b.id,o.id from bigtable b
left join bigtable o
where o.id <= 10;

案例三：过滤条件写在 on, 且是 保留表的字段        --> 不可以谓词下推（不能在 map 阶段提前过滤）
explain
select o.id from bigtable b
left join bigtable o
on b.id <= 10;

案例四：过滤条件写在 on, 且是 非保留表的字段      --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select o.id from bigtable b
left join bigtable o
on o.id <= 10;

-----------------------------------------------------------------------------
#总结：
在关闭cbo的情况下：

        保留表字段（left的左表）     非保留表字段 （left的右表）
on         不可以                      可以
where       可以                      不可以（开启cbo，可以）

！！！注意： 
1、对于 Left outer Join ，右侧的表写在 on后面、左侧的表写在 where后面，性能上有提高；
2、对于 Right outer Join，左侧的表写在 on后面、右侧的表写在 where后面，性能上有提高；

2.3、Full outer Join

案例一：过滤条件写在 where, 且是 保留表的字段     --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select o.id from bigtable b
full join bigtable o 
where b.id <= 10;

案例二：过滤条件写在 where, 且是 非保留表的字段   --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select b.id,o.id from bigtable b
full join bigtable o
where o.id <= 10;

案例三：过滤条件写在 on, 且是 保留表的字段        --> 不可以谓词下推（不能在 map 阶段提前过滤）
explain
select o.id from bigtable b
full join bigtable o
on b.id <= 10;

案例四：过滤条件写在 on, 且是 非保留表的字段      --> 不可以谓词下推（不能在 map 阶段提前过滤）
explain
select o.id from bigtable b
full join bigtable o
on o.id <= 10;

-----------------------------------------------------------------------------
#总结：
Full outer Join
        如果不开启 cbo，写在 on后面，还是 where后面，都不会谓词下推
        如果开启了 cbo，写在 where 可以 谓词下推， 写在 on 不可以 谓词下推

2.4、Inner Join

案例一：过滤条件写在 where, 且是 保留表的字段     --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select o.id from bigtable b
join bigtable o 
where b.id <= 10;

案例二：过滤条件写在 where, 且是 非保留表的字段   --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select b.id,o.id from bigtable b
join bigtable o
where o.id <= 10;

案例三：过滤条件写在 on, 且是 保留表的字段        --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select o.id from bigtable b
join bigtable o
on b.id <= 10;

案例四：过滤条件写在 on, 且是 非保留表的字段      --> 可以谓词下推（能在 map 阶段提前过滤）
explain
select o.id from bigtable b
join bigtable o
on o.id <= 10;
-----------------------------------------------------------------------------
#总结：
Inner Join 
	不管有没有开启cbo，不管写在 on后面 还是 where后面，都会进行谓词下推

平平无奇小码农

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
count及leftjoin中的谓语下推

-- 时间相关标准时间格式字符：yyyy 表示年必须小写MM 表示月必须大写，小写表示月份dd 表示天必须小写HH/hh 表示小时大小写均可mm 表示分钟，必须小写，大写表示月份ss 表示秒必须小写大写表示毫秒SS 表示毫秒必须大写小写表示秒一、count相关1、count(1) and count(*)当表的数据量大些时，对表作分析之后，使用count(1)还要比使用count(*)用时多了！从执行计划来看，count(1)和count()的效果是一样的。但是在表做
复制链接

扫一扫