oracle 取两个值同时不为0_PostgreSQL简单验证，似是而非的union all后order by取limit...

最新推荐文章于 2023-05-08 07:44:14 发布

weixin_39634898

最新推荐文章于 2023-05-08 07:44:14 发布

阅读量257

点赞数

文章标签： oracle 取两个值同时不为0

本文链接：https://blog.csdn.net/weixin_39634898/article/details/111605809

版权

开篇之前先来看一个前几天在MySQL的一个优化案例，一个比较容易犯错的例子：

开发同学期望将两个表的数据汇集之后，取满足条件的top N，于是使用union all联合两个结果集，最后使用limit取top 30。两个表的数据量接近500w，执行耗时12.66秒：

整体来看，性能比较差，于是我改写优化了一些，执行耗时在0.03秒，效果还不错。

具体的优化思路就不介绍了，虽然说这里用到的优化思路没有问题，但这里的逻辑实现是有问题的，因为优化之前的原SQL实际上隐藏了的一个bug，具体是什么问题，看完下面的实验之后，相信也就自然明白了。

下面我们通过PostgreSQL来演示一下这里埋的是什么坑。MySQL和Oracle应该也是一样的效果。时间有限，就不做过多对比了。

先来创建两个测试表：

akendb=# \d+ tab01                                     Table "public.tab01"  Column   |  Type   | Collation | Nullable | Default | Storage | Stats target | Description -----------+---------+-----------+----------+---------+---------+--------------+------------- id        | integer |           | not null | 0       | plain   |              |  num_col01 | integer |           | not null | 0       | plain   |              | akendb=# \d+ tab02                                    Table "public.tab02"  Column   |  Type   | Collation | Nullable | Default | Storage | Stats target | Description -----------+---------+-----------+----------+---------+---------+--------------+------------- id        | integer |           | not null | 0       | plain   |              |  num_col01 | integer |           | not null | 0       | plain   |              | akendb=#

然后每张表简单插入3行数据即可：

select * from tab01;

akendb=# select * from tab01; id | num_col01 ----+-----------  1 |       100  3 |        45  3 |        11(3 rows)akendb=# select * from tab02; id | num_col01 ----+-----------  1 |        73  2 |        58  2 |        22(3 rows)akendb=#

--两表数据union all：

(select * from tab01 )

union all

(select * from tab02)

akendb=# (select * from tab01 ) akendb-# union all akendb-# (select * from tab02) ; id | num_col01 ----+-----------  1 |       100  3 |        45  3 |        11  1 |        73  2 |        58  2 |        22(6 rows)akendb=#

--测试语句：业务期望是想得出两表合并后根据id排序前面的两条记录,即tab01中(id=1,num_col01=100)和tab02中的(id=1,num_col01=73)。

(select * from tab01 )

union all

(select * from tab02)

order by id asc limit 2;

akendb=# (select * from tab01 ) akendb-# union all akendb-# (select * from tab02) akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |       100  1 |        73(2 rows)akendb=#

结果中limit 2，排序就不是顺序的了，接着我们对调一下量表的先后顺序，感觉上确实如愿以偿了。

(select * from tab02 )

union all

(select * from tab01)

order by id asc limit 2;

akendb=# (select * from tab02 ) akendb-# union all akendb-# (select * from tab01) akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |        73  1 |       100(2 rows)akendb=#

依样画葫芦，我们将order by和limit内推到union all各个字句中，查询结果和上面的一致。

(select * from tab01 order by id limit 2)

union all

(select * from tab02 order by id limit 2)

order by id asc limit 2;

(select * from tab02 order by id limit 2)

union all

(select * from tab01 order by id limit 2)

order by id asc limit 2;

akendb=# (select * from tab01 order by id limit 2) akendb-# union all akendb-# (select * from tab02 order by id limit 2)   akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |       100  1 |        73(2 rows)akendb=# (select * from tab02 order by id limit 2) akendb-# union all akendb-# (select * from tab01 order by id limit 2)   akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |        73  1 |       100(2 rows)akendb=#

至此，也许有同学已经看出了其中的算法问题。为了让问题更加明显，我这里将order by字段的值改动一下：让每个表的order by字段重复值的数量大于limit n中的n：

update tab01 set id=1;

update tab02 set id=2;

akendb=# update tab01 set id=1;   --进行update之后id=3的行数为3，大于limit 2取数范围。UPDATE 3akendb=# update tab02 set id=1;UPDATE 3akendb=#

接着，我们先看看“优化”改写后的效果：

(select * from tab01 order by id limit 2)

union all

(select * from tab02 order by id limit 2)

order by id asc limit 2;

(select * from tab02 order by id limit 2)

union all

(select * from tab01 order by id limit 2)

order by id asc limit 2;

akendb=# (select * from tab01 order by id limit 2) akendb-# union all akendb-# (select * from tab02 order by id limit 2)   akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |       100  1 |        45(2 rows) akendb=# (select * from tab02 order by id limit 2) akendb-# union all akendb-# (select * from tab01 order by id limit 2)   akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |        73  1 |        58(2 rows)akendb=#

发现查询结果和预期的一样了。看看执行计划，这里只对其中一个table做了排序后取limit：

akendb=# explain analyze (select * from tab02 order by id limit 2) union all (select * from tab01 order by id limit 2) order by  id asc  limit  2;                                                        QUERY PLAN                                                         --------------------------------------------------------------------------------------------------------------------------- Limit  (cost=110.41..110.45 rows=2 width=8) (actual time=0.031..0.033 rows=2 loops=1)   ->  Merge Append  (cost=110.41..110.49 rows=4 width=8) (actual time=0.029..0.030 rows=2 loops=1)         Sort Key: tab02.id         ->  Limit  (cost=55.20..55.21 rows=2 width=8) (actual time=0.019..0.020 rows=2 loops=1)               ->  Sort  (cost=55.20..60.85 rows=2260 width=8) (actual time=0.019..0.019 rows=2 loops=1)                     Sort Key: tab02.id                     Sort Method: quicksort  Memory: 25kB                     ->  Seq Scan on tab02  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.011..0.012 rows=3 loops=1)         ->  Limit  (cost=55.20..55.21 rows=2 width=8) (actual time=0.008..0.008 rows=1 loops=1)               ->  Sort  (cost=55.20..60.85 rows=2260 width=8) (actual time=0.007..0.007 rows=1 loops=1)                     Sort Key: tab01.id                     Sort Method: quicksort  Memory: 25kB                     ->  Seq Scan on tab01  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.003..0.004 rows=3 loops=1) Planning time: 0.143 ms Execution time: 0.071 ms(15 rows)akendb=# explain analyze (select * from tab01 order by id limit 2) union all (select * from tab02 order by id limit 2) order by  id asc  limit  2;                                                        QUERY PLAN                                                         --------------------------------------------------------------------------------------------------------------------------- Limit  (cost=110.41..110.45 rows=2 width=8) (actual time=0.029..0.032 rows=2 loops=1)   ->  Merge Append  (cost=110.41..110.49 rows=4 width=8) (actual time=0.028..0.029 rows=2 loops=1)         Sort Key: tab01.id         ->  Limit  (cost=55.20..55.21 rows=2 width=8) (actual time=0.018..0.019 rows=2 loops=1)               ->  Sort  (cost=55.20..60.85 rows=2260 width=8) (actual time=0.018..0.018 rows=2 loops=1)                     Sort Key: tab01.id                     Sort Method: quicksort  Memory: 25kB                     ->  Seq Scan on tab01  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.010..0.011 rows=3 loops=1)         ->  Limit  (cost=55.20..55.21 rows=2 width=8) (actual time=0.008..0.008 rows=1 loops=1)               ->  Sort  (cost=55.20..60.85 rows=2260 width=8) (actual time=0.008..0.008 rows=1 loops=1)                     Sort Key: tab02.id                     Sort Method: quicksort  Memory: 25kB                     ->  Seq Scan on tab02  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.004..0.004 rows=3 loops=1) Planning time: 0.140 ms Execution time: 0.070 ms(15 rows)akendb=#

是因为语句改写的问题吗？接下来再直接执行原来的查询语句：

(select * from tab01 )

union all

(select * from tab02)

order by id asc limit 2;

(select * from tab02 )

union all

(select * from tab01)

order by id asc limit 2;

akendb=# (select * from tab01 ) akendb-# union all akendb-# (select * from tab02) akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |        45  1 |       100(2 rows)akendb=# (select * from tab02 ) akendb-# union all akendb-# (select * from tab01) akendb-# order by  id asc  limit  2; id | num_col01 ----+-----------  1 |        58  1 |        73(2 rows)akendb=#

上面的输出尽管执行语句相同，但和之前的查询结果完全不一样了。原因是union all中做order by实际上知会根据第一个表的字段在排序，然后limit的取值实际上最多也只能取第一个结果集的limit n：

akendb=#  explain analyze (select * from tab01 ) union all (select * from tab02) order by  id asc  limit  2;                                                     QUERY PLAN                                                      --------------------------------------------------------------------------------------------------------------------- Limit  (cost=110.40..110.41 rows=2 width=8) (actual time=0.031..0.033 rows=2 loops=1)   ->  Sort  (cost=110.40..121.70 rows=4520 width=8) (actual time=0.030..0.030 rows=2 loops=1)         Sort Key: tab01.id         Sort Method: top-N heapsort  Memory: 25kB         ->  Append  (cost=0.00..65.20 rows=4520 width=8) (actual time=0.012..0.018 rows=6 loops=1)               ->  Seq Scan on tab01  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.012..0.013 rows=3 loops=1)               ->  Seq Scan on tab02  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.004..0.004 rows=3 loops=1) Planning time: 0.118 ms Execution time: 0.062 ms(9 rows)akendb=#  explain analyze (select * from tab02 ) union all (select * from tab01) order by  id asc  limit  2;                                                     QUERY PLAN                                                      --------------------------------------------------------------------------------------------------------------------- Limit  (cost=110.40..110.41 rows=2 width=8) (actual time=0.029..0.031 rows=2 loops=1)   ->  Sort  (cost=110.40..121.70 rows=4520 width=8) (actual time=0.028..0.028 rows=2 loops=1)         Sort Key: tab02.id         Sort Method: top-N heapsort  Memory: 25kB         ->  Append  (cost=0.00..65.20 rows=4520 width=8) (actual time=0.010..0.016 rows=6 loops=1)               ->  Seq Scan on tab02  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.010..0.011 rows=3 loops=1)               ->  Seq Scan on tab01  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.003..0.004 rows=3 loops=1) Planning time: 0.110 ms Execution time: 0.058 ms(9 rows)akendb=#

所以，在union all之后执行order by取top n看似逻辑的理解上正确了，但在实际的算法实现上却偏差了，但不少开发同学可能并没有觉察，以前遇到过，今儿再次遇到了，简单记录一下。

weixin_39634898

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
oracle 取两个值同时不为0_PostgreSQL简单验证，似是而非的union all后order by取limit...

开篇之前先来看一个前几天在MySQL的一个优化案例，一个比较容易犯错的例子：开发同学期望将两个表的数据汇集之后，取满足条件的top N，于是使用union all联合两个结果集，最后使用limit取top 30。两个表的数据量接近500w，执行耗时12.66秒：整体来看，性能比较差，于是我改写优化了一些，执行耗时在0.03秒，效果还不错。具体的优化思路就不介绍了，虽然说这里用到的优化思路...
复制链接

扫一扫

oracle 取两个值同时不为0_PostgreSQL简单验证，似是而非的union all后order by取limit...

“相关推荐”对你有帮助么？