A materialize node means the output of whatever is below it in the tree (which can be a scan, or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.
So in your case, the planner is determining that the result of a scan on one of your tables will fit in memory, and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.
具体的SQL如下
aligputf8=# select count(1) from ttt1;
count
-------
10000
(1 row)
aligputf8=#
aligputf8=# select count(1) from ttt2;
count
-------
10000
(1 row)
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..1409.07 rows=1667 width=8)
-> Nested Loop (cost=0.00..1409.07 rows=1667 width=8)
-> Broadcast Motion 1:6 (slice1; segments: 1) (cost=0.00..137.07 rows=6 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=1 width=4)
Filter: id = 3
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(6 rows)
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
aligputf8=#
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id<3000;
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=122.00..3599268.93 rows=4998334 width=8)
-> Nested Loop (cost=122.00..3599268.93 rows=4998334 width=8)
-> Broadcast Motion 6:6 (slice1; segments: 6) (cost=0.00..346.93 rows=2999 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=500 width=4)
Filter: id < 3000
-> Materialize (cost=122.00..222.00 rows=1667 width=4)
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(7 rows)
我的理解就是PG为了加快nested loop循环的速度,把b表的数据缓存在了内存中。我们可以看到前面一个SQL,只对b表数据进行1次扫描,因此并不需要进行缓存;后者需要进行2999次扫描。