执行计划中的nestloop join 对比hash join
两种join 方式的定义
NESTE LOOP:
在嵌套循环中,内表被外表驱动,外表返回的每一行都要在内表中检索找到与它匹配的行。两个概念:驱动表(外部表)和内部表,这里用表这个次其实不是很准确,外部表和内部表可以是某张表的结果集。
在执行计划中如何区分外部表和内部表。我的通常习惯是安装前后顺序,nestloop join 里面位于上面的是外部表,位于下面的是内部表。一般情况下外部表的loop=1,内部表的loop=外部表的行数。
如下:
根据定义我们可以得到 nest loop 适合于一下两个场景
1、外部表结果集较小,循环次数就少。内部表如果是大表一定要保证大表对应的join字段有索引。
2、外部表结果集较大,内部表结果集较少的场景,同样可以保证nest loop 性能较优,但是如果外部表结果集非常大,循环次数就会很多,就会导致查询缓慢,这时候用hash join 可能效果会更好。
所以总结下来 如果要优化nest loop join 有两个方向
1、减少驱动表的结果集,降低循环次数
2、加快内部表的遍历时间,通过加索引等方式提高内部表的查询效率。
但是具体外部表的记录集多大之后就不建议使用nest loop join 了,有说1万的有说10万的,本人觉得还是要根据实际sql,这里应该没有具体限定值。
nest loop 相关内核参数
enable_nestloop 是否使用nestloop join 默认是on
HASH JOIN :
hash join 做大数据集连接时的常用方式,优化器使用两个表中较小的表(或数据源)利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。
这种方式适用于较小的表完全可以放于内存中的情况,
这样总成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器会将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段,此时要有较大的临时段从而尽量提高I/O 的性能。
从这里就可以看到他的特点和hash索引有点相似,只能用于等值连接。另外他还受限于内存设置,而KingBase 对应的内存就是work_mem。需要保证hash数据全部存放在内存中才能保证hash join 性能最优,否则如果用到过多的 disk 就会严重影响性能。
下图是一个hash join的执行计划截图
注意一点就是hash join 不走索引。
不同场景中hash join 和nest loop join 对比。
场景1: 小表做join
1、构建测试表:
drop table app_family;
CREATE TABLE app_family (
"family_id" character varying(32 char) NOT NULL,
"application_id" character varying(32 char) NULL,
"family_number" character varying(50 char) ,
"household_register_number" character varying(50 char),
"poverty_reason" character varying(32 char),
CONSTRAINT "pk_app_family_idpk" PRIMARY KEY (family_id));
insert into app_family select generate_series(1,10000),generate_series(1,10000),'aaaa','aaa','bbb' from dual ;
INSERT INTO app_family SELECT * FROM app_family
CREATE TABLE app_family2 AS SELECT * FROM app_family
UPDATE app_family2 SET application_id ='a' WHERE family_id:
:int > 5000;
2、查看nestloop join
SET enable_hashjoin TO off ;
SET enable_mergejoin TO OFF;
EXPLAIN ANALYZE select t.*
from app_family t LEFT JOIN app_family2 p
on t.family_id=p.application_id;
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------+
Nested Loop Left Join (cost=0.00..1500387.00 rows=10000 width=21) (actual time=0.018..11663.381 rows=10000 loops=1) |
Join Filter: ((t.family_id)::text = (p.application_id)::text) |
Rows Removed by Join Filter: 99995000 |
-> Seq Scan on app_family t (cost=0.00..165.00 rows=10000 width=21) (actual time=0.007..10.015 rows=10000 loops=1) |
-> Materialize (cost=0.00..247.00 rows=10000 width=3) (actual time=0.000..0.405 rows=10000 loops=10000) |
-> Seq Scan on app_family2 p (cost=0.00..197.00 rows=10000 width=3) (actual time=0.004..1.035 rows=10000 loops=1)|
Planning Time: 0.331 ms |
Execution Time: 11667.704 ms
可以看到由于没有索引, 内部表loop 一次 耗时1ms左右 10000次循环就是1万ms。
我们创建索引然后再看看执行计划
CREATE INDEX idx_family2 ON app_family2(application_id);
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
Nested Loop Left Join (cost=0.29..3685.00 rows=10000 width=21) (actual time=0.038..35.024 rows=10000 loops=1) |
-> Seq Scan on app_family t (cost=0.00..165.00 rows=10000 width=21) (actual time=0.009..1.083 rows=10000 loops=1) |
-> Index Only Scan using idx_family2 on app_family2 p (cost=0.29..0.33 rows=2 width=3) (actual time=0.003..0.003 rows=0 loops=10000)|
Index Cond: (application_id = (t.family_id)::text) |
Heap Fetches: 0 |
Planning Time: 0.228 ms |
Execution Time: 35.682 ms
创建索引之后用时35ms。有明显提升。
2、收集hash join 执行计划
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------+
Hash Right Join (cost=290.00..513.26 rows=10000 width=21) (actual time=1.779..5.117 rows=10000 loops=1) |
Hash Cond: ((p.application_id)::text = (t.family_id)::text) |
-> Seq Scan on app_family2 p (cost=0.00..197.00 rows=10000 width=3) (actual time=0.013..0.638 rows=10000 loops=1) |
-> Hash (cost=165.00..165.00 rows=10000 width=21) (actual time=1.743..1.744 rows=10000 loops=1) |
Buckets: 16384 Batches: 1 Memory Usage: 663kB |
-> Seq Scan on app_family t (cost=0.00..165.00 rows=10000 width=21) (actual time=0.012..0.602 rows=10000 loops=1)|
Planning Time: 0.105 ms |
Execution Time: 5.495 ms |
再这种情况下可以看到hash join 用时5ms。所以hash join 优势还是很明显的。 | |
场景2: 大表+小表
1、我们重建app_family,将数据量增加到1000万,然后先收集 nest loop的执行计划。
insert into app_family select generate_series(1,10000000),generate_series(1,10000000),'aaaa','aaa','bbb' from dual ;
QUERY PLAN |
-------------------------------------------------------------------------------------------------------------------------------------------+
Nested Loop Left Join (cost=0.29..3574032.75 rows=9999565 width=27) (actual time=0.063..33128.886 rows=10000000 loops=1) |
-> Seq Scan on app_family t (cost=0.00..174060.65 rows=9999565 width=27) (actual time=0.042..1447.904 rows=10000000 loops=1) |
-> Index Only Scan using idx_family2 on app_family2 p (cost=0.29..0.32 rows=2 width=3) (actual time=0.003..0.003 rows=0 loops=10000000)|
Index Cond: (application_id = (t.family_id)::text) |
Heap Fetches: 0 |
Planning Time: 0.224 ms |
Execution Time: 33724.901 ms
但是如果我们更换两个表的join顺序
EXPLAIN ANALYZE SELECT t.*
from app_family2 p LEFT JOIN app_family t
on t.family_id=p.application_id
可以看到小表做驱动表,大表走索引情况下执行效率明显提升用时84ms。
QUERY PLAN |
------------------------------------------------------------------------------------------------------------------------------------------+
Nested Loop Left Join (cost=0.43..78386.00 rows=10000 width=27) (actual time=0.058..83.462 rows=10000 loops=1) |
-> Seq Scan on app_family2 p (cost=0.00..197.00 rows=10000 width=3) (actual time=0.031..1.349 rows=10000 loops=1) |
-> Index Scan using pk_app_family_idpk on app_family t (cost=0.43..7.82 rows=1 width=27) (actual time=0.008..0.008 rows=0 loops=10000)|
Index Cond: ((family_id)::text = (p.application_id)::text) |
Planning Time: 0.091 ms |
Execution Time: 84.238 ms
同样的情况下我们看一下hash join 的执行情况
QUERY PLAN |
-------------------------------------------------------------------------------------------------------------------------------------+
Hash Right Join (cost=367412.21..436072.46 rows=9999565 width=27) (actual time=3414.711..6855.453 rows=10000000 loops=1) |
Hash Cond: ((p.application_id)::text = (t.family_id)::text) |
-> Seq Scan on app_family2 p (cost=0.00..197.00 rows=10000 width=3) (actual time=0.010..0.885 rows=10000 loops=1) |
-> Hash (cost=174060.65..174060.65 rows=9999565 width=27) (actual time=3413.135..3413.136 rows=10000000 loops=1) |
Buckets: 65536 Batches: 256 Memory Usage: 2826kB |
-> Seq Scan on app_family t (cost=0.00..174060.65 rows=9999565 width=27) (actual time=0.018..947.863 rows=10000000 loops=1)|
Planning Time: 0.145 ms |
Execution Time: 7154.517 ms |
QUERY PLAN |
-------------------------------------------------------------------------------------------------------------------------------------+
Hash Left Join (cost=367412.21..436072.46 rows=10000 width=27) (actual time=3363.830..4873.281 rows=10000 loops=1) |
Hash Cond: ((p.application_id)::text = (t.family_id)::text) |
-> Seq Scan on app_family2 p (cost=0.00..197.00 rows=10000 width=3) (actual time=0.017..0.970 rows=10000 loops=1) |
-> Hash (cost=174060.65..174060.65 rows=9999565 width=27) (actual time=3362.324..3362.325 rows=10000000 loops=1) |
Buckets: 65536 Batches: 256 Memory Usage: 2826kB |
-> Seq Scan on app_family t (cost=0.00..174060.65 rows=9999565 width=27) (actual time=0.011..946.358 rows=10000000 loops=1)|
Planning Time: 0.158 ms |
Execution Time: 4873.893 ms |
可以看到hash join 在两种不同join顺序的情况下执行时间有差距,但是不大。
但是相比较于最优情况下的nest loop join 是有明显差距。
| |
场景3:两个大表
构造两个1000万的大表,然后对比nest loop 和 hash join
DROP TABLE app_family2;
CREATE TABLE app_family2 AS SELECT * FROM app_family;
UPDATE app_family2 SET application_id ='a' WHERE family_id::int > 5000;
CREATE INDEX idx_family2 ON app_family2(application_id);
首先看一下nest loop
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------------------------+
Nested Loop Left Join (cost=0.43..5249466.93 rows=10000000 width=27) (actual time=0.024..37066.301 rows=10000000 loops=1) |
-> Seq Scan on app_family2 p (cost=0.00..238135.00 rows=10000000 width=7) (actual time=0.008..1733.272 rows=10000000 loops=1) |
-> Index Scan using pk_app_family_idpk on app_family t (cost=0.43..0.50 rows=1 width=27) (actual time=0.003..0.003 rows=0 loops=10000000)|
Index Cond: ((family_id)::text = (p.application_id)::text) |
Planning Time: 0.241 ms |
Execution Time: 37644.050 ms
可以看到因为有索引,所以全部的执行时间就是集中在loop 上
和场景1 执行计划基本一样唯一差异的就是loop次数 总计用时37s
然后看一下 hash join
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------------------+
Hash Left Join (cost=367412.21..778841.36 rows=10027496 width=27) (actual time=4408.948..9869.481 rows=10000000 loops=1) |
Hash Cond: ((p.application_id)::text = (t.family_id)::text) |
-> Seq Scan on app_family2 p (cost=0.00..238409.96 rows=10027496 width=2) (actual time=0.039..2061.101 rows=10000000 loops=1) |
-> Hash (cost=174060.65..174060.65 rows=9999565 width=27) (actual time=4408.762..4408.763 rows=10000000 loops=1) |
Buckets: 65536 Batches: 256 Memory Usage: 2826kB |
-> Seq Scan on app_family t (cost=0.00..174060.65 rows=9999565 width=27) (actual time=0.013..1281.885 rows=10000000 loops=1)|
Planning Time: 0.156 ms |
Execution Time: 10187.486 ms
可以看到这种情况下hash join 明显快很多, 他的耗时基本上就是
hash表生成时间+另一张表扫描时间 。
然后我们在模拟一下work_mem 偏小的情况下 hash join
SET work_mem=1024 --设置work_mem =1M
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------------------+
Hash Left Join (cost=367412.21..778841.36 rows=10027496 width=27) (actual time=4983.689..9894.142 rows=10000000 loops=1) |
Hash Cond: ((p.application_id)::text = (t.family_id)::text) |
-> Seq Scan on app_family2 p (cost=0.00..238409.96 rows=10027496 width=2) (actual time=0.044..1709.922 rows=10000000 loops=1) |
-> Hash (cost=174060.65..174060.65 rows=9999565 width=27) (actual time=4983.615..4983.615 rows=10000000 loops=1) |
Buckets: 16384 Batches: 1024 Memory Usage: 703kB |
-> Seq Scan on app_family t (cost=0.00..174060.65 rows=9999565 width=27) (actual time=0.008..1278.113 rows=10000000 loops=1)|
Planning Time: 0.165 ms |
Execution Time: 10204.286 ms |
可以看到 work_mem 减小后, Buckets 少了, 但是Batches 增加了。
Buckets 就是用于存放hash值的内存空间,因为work_mem 小了 所以 Buckets 减少。
batchs代表为了执行这个hash join Buckets 被服用了几次。
如果我们增大SET work_mem=1024000 到1G 看一下执行计划
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------------------+
Hash Left Join (cost=299055.21..563787.36 rows=10027496 width=27) (actual time=5407.391..8326.606 rows=10000000 loops=1) |
Hash Cond: ((p.application_id)::text = (t.family_id)::text) |
-> Seq Scan on app_family2 p (cost=0.00..238409.96 rows=10027496 width=2) (actual time=0.040..1301.358 rows=10000000 loops=1) |
-> Hash (cost=174060.65..174060.65 rows=9999565 width=27) (actual time=5315.162..5315.162 rows=10000000 loops=1) |
Buckets: 16777216 Batches: 1 Memory Usage: 724606kB |
-> Seq Scan on app_family t (cost=0.00..174060.65 rows=9999565 width=27) (actual time=0.037..1143.247 rows=10000000 loops=1)|
Planning Time: 0.138 ms |
Execution Time: 8730.006 ms |
可以看到 work_mem 到1G后,内存中 Buckets 增加了很多,但是 Batches 就1次
这说明所有的hash值都存在了内存中, 整体时间耗时8.7s 提升1.3s左右 | |
总结
通过实验我们进一步验证了nest loop 和hash join 各自适应场景和注意事项总结下来如下
1、nest loop join 顺序很重要,驱动表要尽可能晓。 因为loop循环次数直接影响执行时间
2、两个大表关联场景hash join 最优,但是hash join 对work_mem 要求高,尤其是IO比较慢的情况下。
3、一般小表的join,hash join 快,但都是ms级别响应,差别不大。