postgresql explain query中的materialize

今天在GREENPLUM的执行计划中看到了materialize,不知道这个步骤是做什么的,查了下文档,没有相关解释。没事,GREENPLUM不行还有POSTGRESQL呢,GOOGLE了一把,找到了答案。

Q:

What does materialize do? I'm joining two tables, not views or anything like that. 

A:

A materialize node means the output of whatever is below it in the tree (which can be a scan, or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.

So in your case, the planner is determining that the result of a scan on one of your tables will fit in memory, and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.


具体的SQL如下

aligputf8=# select count(1) from ttt1;
 count
-------
 10000
(1 row)

aligputf8=#
aligputf8=# select count(1) from ttt2;
 count
-------
 10000
(1 row)


aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Gather Motion 6:1  (slice2; segments: 6)  (cost=0.00..1409.07 rows=1667 width=8)
   ->  Nested Loop  (cost=0.00..1409.07 rows=1667 width=8)
         ->  Broadcast Motion 1:6  (slice1; segments: 1)  (cost=0.00..137.07 rows=6 width=4)
               ->  Seq Scan on ttt1 a  (cost=0.00..137.00 rows=1 width=4)
                     Filter: id = 3
         ->  Seq Scan on ttt2 b  (cost=0.00..112.00 rows=1667 width=4)
(6 rows)

aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
aligputf8=#
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id<3000;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Gather Motion 6:1  (slice2; segments: 6)  (cost=122.00..3599268.93 rows=4998334 width=8)
   ->  Nested Loop  (cost=122.00..3599268.93 rows=4998334 width=8)
         ->  Broadcast Motion 6:6  (slice1; segments: 6)  (cost=0.00..346.93 rows=2999 width=4)
               ->  Seq Scan on ttt1 a  (cost=0.00..137.00 rows=500 width=4)
                     Filter: id < 3000
         ->  Materialize  (cost=122.00..222.00 rows=1667 width=4)
               ->  Seq Scan on ttt2 b  (cost=0.00..112.00 rows=1667 width=4)
(7 rows)

我的理解就是PG为了加快nested loop循环的速度,把b表的数据缓存在了内存中。我们可以看到前面一个SQL,只对b表数据进行1次扫描,因此并不需要进行缓存;后者需要进行2999次扫描。




  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值