PostgreSQL 联结方式--嵌套循环联结

最新推荐文章于 2024-05-09 21:38:55 发布

一白.

最新推荐文章于 2024-05-09 21:38:55 发布

阅读量6.2k

点赞数

分类专栏： PostgreSQL 基础文章标签： postgresql 联结方式嵌套

本文链接：https://blog.csdn.net/luojinbai/article/details/44033729

版权

PostgreSQL 基础专栏收录该内容

65 篇文章 19 订阅

订阅专栏

嵌套循环联结使用一次访问运算所得到的结果集中的每一行来与另一个表进行碰撞。如果结果集的大小是有限的并且在用来联结的列上建有索引的话，这种联结的效率通常是最高的。嵌套循环联结的运算成本主要是读取外层表中的每一行并将其与锁匹配的内层表中的行联结所需的成本。
顾名思义，签到循环联结就是一个循环嵌套在另一个循环当中。外层循环来说基本就是一个只使用WHERE子句中属于驱动表的条件对它进行的查询。当数据行经过了外层条件筛选并被确认匹配条件后，这些行就会逐个进入到内层循环中。然后再基于联结列进行逐行检查看是否与被联结的表中的某一行相匹配。如果这一行与第二次的检查相匹配，就将会被传递到检查计划的下一步或者如果没有更多步骤的话直接被包含在最终的结果集中。
这种类型的联结的强大之处在于所使用的内存是非常少的。因为数据行集一次只加工一行，所需的开支也是非常小的。由于这个原因，除了使用一次加工一行这种方式来建立一个很大的数据集需要较长的时间这一点以外，它也是适合进行行大数据集加工的。这就是为什么前面提过的嵌套循环联结在结果集较小的时候是最好的。嵌套循环联结的基本度量就是为了准备最终结果集所需要访问的数据块数目。

1. 建表

postgres=# create table class(id serial primary key,class_name character varying);
CREATE TABLE
postgres=# 
postgres=# create table student(id serial primary key,name character varying,class_id integer);
CREATE TABLE

在student的class_id上创建索引

postgres=# create index idx_student_class_id on student(class_id);
CREATE INDEX

postgres=# \d student
                               Table "public.student"
  Column  |       Type        |                      Modifiers                       
----------+-------------------+------------------------------------------------------
 id       | integer           | not null default nextval('student_id_seq'::regclass)
 name     | character varying | 
 class_id | integer           | 
Indexes:
    "student_pkey" PRIMARY KEY, btree (id)
    "idx_student_class_id" btree (class_id)
postgres=# 
postgres=# \d class
                                Table "public.class"
   Column   |       Type        |                     Modifiers                      
------------+-------------------+----------------------------------------------------
 id         | integer           | not null default nextval('class_id_seq'::regclass)
 class_name | character varying | 
Indexes:
    "class_pkey" PRIMARY KEY, btree (id)

2. 插入测试数据：

postgres=# insert into class values(1,'aa');
INSERT 0 1
postgres=# insert into class values(2,'bb');
INSERT 0 1
postgres=# insert into class values(3,'cc');
INSERT 0 1

postgres=# insert into student values(1,'tom',1);
INSERT 0 1
postgres=# insert into student values(2,'jim',1);
INSERT 0 1
postgres=# insert into student values(3,'jack',2);
INSERT 0 1
postgres=# insert into student(name,class_id) select name,class_id from student;
INSERT 0 3
postgres=# insert into student(name,class_id) select name,class_id from student;
INSERT 0 6
postgres=# insert into student(name,class_id) select name,class_id from student;
INSERT 0 12
postgres=# insert into student(name,class_id) select name,class_id from student;
INSERT 0 24
postgres=# insert into student(name,class_id) select name,class_id from student;
INSERT 0 48
postgres=# insert into student(name,class_id) select name,class_id from student;
INSERT 0 96
postgres=# insert into student(name,class_id) select name,class_id from student;
INSERT 0 192

postgres=# select count(0) from student;
 count 
-------
   384
(1 row)

3. select s.*,c.class_name from student s,class c where s.class_id=c.id;

这个查询将会向下面的伪代码一样来处理结果：
for each row in(select * from student) loop
  for(select id,class_name from class where id=outer.class_id) loop
      if match then pass the row on to the next step
      if inner join and no match then discard the row
      if outer join and no match then set inner column values to null and pass the row on to the next step
  end loop;
end loop;
4. 查看执行计划

①、

postgres=# explain(analyze,verbose,buffers) select s.*,c.class_name from student s,class c where s.class_id=c.id and c.id<=2;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.15..125.88 rows=128 width=44) (actual time=0.021..1.076 rows=384 loops=1)
   Output: s.id, s.name, s.class_id, c.class_name
   Buffers: shared hit=771
   —>  Seq Scan on public.student s  (cost=0.00..6.84 rows=384 width=12) (actual time=0.010..0.076 rows=384 loops=1)
         Output: s.id, s.name, s.class_id
         Buffers: shared hit=3
   —>  Index Scan using class_pkey on public.class c  (cost=0.15..0.30 rows=1 width=36) (actual time=0.001..0.002 rows=1 <span style="color:#ff0000;">loops=384</span>)
         Output: c.id, c.class_name
         Index Cond: ((c.id = s.class_id) AND (c.id <= 2))
         Buffers: shared hit=768
 Total runtime: 1.168 ms
(11 rows)

可以看到内表循环了384次，相当于执行了384次 select c.id,c.class_name from class where c.id=outer.class_id，每次都是索引扫描，在计算的成本的时候当然该句sql的成本就要乘以384，所以总成本125.88=6.64+384*0.30+联结的成本=122.04+联结的成本。

②、

postgres=# explain(analyze,verbose,buffers) select s.*,c.class_name from student s,class c where s.class_id=c.id and c.id in(1,2);
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.32..32.03 rows=1 width=44) (actual time=0.027..0.362 rows=384 loops=1)
   Output: s.id, s.name, s.class_id, c.class_name
   Join Filter: (s.class_id = c.id)
   Rows Removed by Join Filter: 384
   Buffers: shared hit=6
   —>  Seq Scan on public.student s  (cost=0.00..6.84 rows=384 width=12) (actual time=0.008..0.051 rows=384 loops=1)
         Output: s.id, s.name, s.class_id
         Buffers: shared hit=3
   —>  Materialize  (cost=8.32..13.67 rows=2 width=36) (actual time=0.000..0.000 rows=2 <span style="color:#ff0000;">loops=384</span>)
         Output: c.class_name, c.id
         Buffers: shared hit=3
         —>  Bitmap Heap Scan on public.class c  (cost=8.32..13.66 rows=2 width=36) (actual time=0.012..0.013 rows=2 loops=1)
               Output: c.class_name, c.id
               Recheck Cond: (c.id = ANY ('{1,2}'::integer[]))
               Buffers: shared hit=3
                —>  Bitmap Index Scan on class_pkey  (cost=0.00..8.32 rows=2 width=0) (actual time=0.008..0.008 rows=2 loops=1)
                     Index Cond: (c.id = ANY ('{1,2}'::integer[]))
                     Buffers: shared hit=2
 Total runtime: 0.415 ms
(19 rows)

Materialize: 物化节点, 小于work_mem时全部塞内存, 大时需要用到磁盘. 这里虽然是嵌套循环,但是下面的索引扫描只有一次, 循环384次是在物化节点, 物化节点计算循环成本时, 只计算内存操作的成本，故这里的总成本32.03=13.66+6.84+联结的成本=20.50+联结的成本。
对Materialize物化节点的解释：

What does materialize do? I'm joining two tables, not views or anything like that.

A materialize node means the output of whatever is below it in the tree (which can be a scan, or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.

So in your case, the planner is determining that the result of a scan on one of your tables will fit in memory, and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.

以上解释来自： http://blog.csdn.net/wanghai__/article/details/6660299