前言:熟悉Oracle hash join 的用户都知道,如果进行连接的两张表采用的相同的hash 分区算法(分区数量,分区列类型),则在进行hash连接时,优化器会逐个分区进行连接,减少大量IO操作。KingbaseES 优化器也提供同样的优化算法。有两个参数 enable_partitionwise_join , enable_partitionwise_aggregate 用于分区智能连接和聚集,默认这两个参数是 off 的。开启这两个参数,对于采用hash join 或 hash aggregate 的执行方式有性能提升。
我们来看以下例子:
1、准备数据
drop table t1;
create table t1(id1 integer,name1 text) partition by hash(id1);
create table t1_p1 partition of t1 for values with (MODULUS 4, REMAINDER 0);
create table t1_p2 partition of t1 for values with (MODULUS 4, REMAINDER 1);
create table t1_p3 partition of t1 for values with (MODULUS 4, REMAINDER 2);
create table t1_p4 partition of t1 for values with (MODULUS 4, REMAINDER 3);
insert into t1 select generate_series(1,10000000),'abc'||generate_series(1,10000000);
drop table t2;
create table t2(id2 integer,name1 text) partition by hash(id2);
create table t2_p1 partition of t2 for values with (MODULUS 4, REMAINDER 0);
create table t2_p2 partition of t2 for values with (MODULUS 4, REMAINDER 1);
create table t2_p3 partition of t2 for values with (MODULUS 4, REMAINDER 2);
create table t2_p4 partition of t2 for values with (MODULUS 4, REMAINDER 3);
insert into t2 select generate_series(1,3000000),'abc'||generate_series(1,3000000);
2、查看执行计划
不使用智能连接:把所有分区的数据取回,以整个表为单位进行hash 及 hash join。因为hash操作是整表,需要的内存更大。以下例子可以看到 batches 8,表示需要分8次与hash 表进行逐个连接。这会产生大量IO.
test=# show enable_partitionwise_join ;
enable_partitionwise_join
---------------------------
off
(1 row)
test=# explain analyze select count(*) from t1,t2 where id1=id2;
QUERY PLAN
----------------------------------