一、测试环境
名称 | 值 |
---|---|
cpu | 12th Gen Intel® Core™ i7-12700H |
操作系统 | CentOS Linux release 7.9.2009 (Core) |
内存 | 3G |
逻辑核数 | 2 |
Gbase-8a数据库版本 | 9.5.3.27 |
二、生成测试数据
1、随机分布表
表名 | 条数 | 备注 |
---|---|---|
czg | 100000 | |
czg_1 | 100 |
gbase> CREATE TABLE "czg" ("a" int(11), name varchar(100));
Query OK, 0 rows affected, 1 warning (Elapsed: 00:00:00.04)
gbase> DELIMITER //
gbase> CREATE PROCEDURE "GenerateTestData"(num int)
-> begin
-> declare tempval int;
->
-> set tempval = 1;
-> set autocommit = off;
-> label: loop
-> insert into czg values(tempval,'SUN'||tempval);
-> if tempval >= num then
-> leave label;
-> else
-> set tempval = tempval + 1;
-> end if;
-> end loop label;
-> commit;
-> end;//
Query OK, 0 rows affected, 1 warning (Elapsed: 00:00:00.05)
gbase> call "GenerateTestData"(100000);//
Query OK, 0 rows affected (Elapsed: 00:00:02.02)
gbase> select * from czg limit 10;
-> //
+------+-------+
| a | name |
+------+-------+
| 1 | SUN1 |
| 2 | SUN2 |
| 3 | SUN3 |
| 4 | SUN4 |
| 5 | SUN5 |
| 6 | SUN6 |
| 7 | SUN7 |
| 8 | SUN8 |
| 9 | SUN9 |
| 10 | SUN10 |
+------+-------+
10 rows in set (Elapsed: 00:00:00.04)
gbase> create table czg_1 like czg;
-> //
Query OK, 0 rows affected, 1 warning (Elapsed: 00:00:00.09)
gbase> insert into czg_1 select * from czg limit 100;
-> //
Query OK, 100 rows affected (Elapsed: 00:00:00.07)
Records: 100 Duplicates: 0 Warnings: 0
gbase> commit;
-> //
Query OK, 0 rows affected (Elapsed: 00:00:00.04)
gbase> select count(*) from czg.czg_1;//
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (Elapsed: 00:00:00.03)
gbase> select * from czg_1 limit 10;//
+------+-------+
| a | name |
+------+-------+
| 1 | SUN1 |
| 2 | SUN2 |
| 3 | SUN3 |
| 4 | SUN4 |
| 5 | SUN5 |
| 6 | SUN6 |
| 7 | SUN7 |
| 8 | SUN8 |
| 9 | SUN9 |
| 10 | SUN10 |
+------+-------+
10 rows in set (Elapsed: 00:00:00.03)
2、HASH分布表
表名 | 条数 | 备注 |
---|---|---|
czg_hash | 100000 | 分布键a |
czg_hash_1 | 100 | 分布键a |
czg_hash_2 | 100000 | 分布键name |
gbase> CREATE TABLE "czg_hash" ("a" int(11), name varchar(100))distributed by('a');
-> //
Query OK, 0 rows affected, 1 warning (Elapsed: 00:00:00.10)
gbase> CREATE TABLE "czg_hash_1" ("a" int(11), name varchar(100))distributed by('a');//
Query OK, 0 rows affected, 1 warning (Elapsed: 00:00:00.05)
gbase> insert into czg_hash select * from czg;
-> //
Query OK, 100000 rows affected (Elapsed: 00:00:00.11)
Records: 100000 Duplicates: 0 Warnings: 0
gbase> insert into czg_hash_1 select * from czg limit 100;//
Query OK, 100 rows affected (Elapsed: 00:00:00.12)
Records: 100 Duplicates: 0 Warnings: 0
gbase> select count(*) from czg_hash;//
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
1 row in set (Elapsed: 00:00:00.02)
gbase> CREATE TABLE "czg_hash_2" ("a" int(11), name varchar(100))distributed by('name');//
Query OK, 0 rows affected, 1 warning (Elapsed: 00:00:00.09)
gbase> insert into czg_hash_2 select * from czg;//
Query OK, 100000 rows affected (Elapsed: 00:00:00.14)
Records: 100000 Duplicates: 0 Warnings: 0
gbase> commit;
-> //
Query OK, 0 rows affected (Elapsed: 00:00:00.05)
gbase> select count(*) from czg_hash_2;//
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
1 row in set (Elapsed: 00:00:00.02)
3、复制表
表名 | 条数 | 备注 |
---|---|---|
czg_copy | 100000 |
gbase> CREATE TABLE "czg_copy" ("a" int(11), name varchar(100)) replicated;//
Query OK, 0 rows affected, 1 warning (Elapsed: 00:00:00.09)
gbase> insert into czg_copy select * from czg;//
Query OK, 100000 rows affected (Elapsed: 00:00:00.17)
Records: 100000 Duplicates: 0 Warnings: 0
gbase> commit;//
Query OK, 0 rows affected (Elapsed: 00:00:00.05)
gbase> select count(*) from czg_copy;//
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
1 row in set (Elapsed: 00:00:00.03)
三、执行计划-操作符含义
gbase> explain select * from czg.czg_hash union select * from czg.czg_hash;
+----+----------+-----------+-------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+-----------+
| 00 | [RESULT] | Table | czg_hash[a] | |
| | | UNION | | |
| | | Table | czg_hash[a] | |
+----+----------+-----------+-------------+-----------+
3 rows in set (Elapsed: 00:00:00.02)
列名 | 含义 |
---|---|
ID | 执行计划的步骤,从00开始,从显示界面的下方向上执行。 |
MOTION | RESULT:结果发送到客户端,一般为执行计划的最后一步; |
GATHER:结果发送到汇总节点,一般在sort或聚集函数操作前; | |
REDIST(…):结果HASH重分布,括号中为计算HASH的列,如果超长则截断为两个点; | |
NO REDIST:结果直接保存到对应的数据分片,不进行重分布; | |
BROADCAST:结果拉复制表; | |
RAND REDIST:结果随机分布到所有节点; | |
SCALAR N:结果为标量,N为标量子查询的编号,如果条件中有引用,则使用&xNx&方式引用。 | |
OPERATION | SCAN:单表扫描,并使用条件过滤数据; |
Table:单表,没有过滤条件; | |
SubQueryN:子查询,N为自动编号; | |
Step:使用前一个Step的结果; | |
INNER/LEFT/FULL JOIN:连接操作; | |
WHERE:子查询的WHERE条件; | |
GROUP:分组操作; | |
ORDER:排序操作; | |
LIMIT:计算LIMIT,OFFSET; | |
AGG:distinct,聚集操作; | |
UNION/UNION ALL/MINUS/INTERSECT:UNION操作。 | |
TABLE | 某个操作OPERATION涉及的表,只显示别名和属性,超长截断为两个点 |
HASH分布表:中括号中显示HASH列; | |
复制表:显示[REP]; | |
随机分布表:显示[DIS]; | |
子查询:OPERATION列显示SubQueryN,其中N为数字,用来区分不同的子查询; | |
某个步骤的结果集:OPERATION列显示为Step,本列显示为其中N为ID列中的对应值,表示该步骤的结果。 | |
CONDITION | 显示某个操作OPERATION的条件: |
SCAN操作单表过滤条件;(b{H} = ‘2太阳光2’) 表示用HASH索引扫描。(b{S} = ‘2太阳光2’) 表示用全表扫描。 | |
JOIN操作的连接条件; | |
GROUP BY操作涉及列或表达式; | |
ORDER BY操作涉及列或表达式; | |
LIMIT OFFSET内容; |
四、实验
参考老紫竹的家:《GBase 8a执行计划从简单入门到复杂用例讲解》。
1、单表查询
随机分布表会提示:DIS
HASH分布表会提示:[a],[name]
复制表会提示:REP
个人理解:第二个HASH分布表的执行计划:
(1)各节点扫描vcname000001.czg.czg_hash表数据。
(2)汇总到RESULT操作符发给客户端。
gbase> explain select * from czg.czg;
+----+----------+-----------+----------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+----------+-----------+
| 00 | [RESULT] | Table | czg[DIS] | |
+----+----------+-----------+----------+-----------+
1 row in set (Elapsed: 00:00:00.03)
gbase> explain select * from czg.czg_hash;
+----+----------+-----------+-------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+-----------+
| 00 | [RESULT] | Table | czg_hash[a] | |
+----+----------+-----------+-------------+-----------+
1 row in set (Elapsed: 00:00:00.02)
gbase> explain select * from czg.czg_hash_2;
+----+----------+-----------+------------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+------------------+-----------+
| 00 | [RESULT] | Table | czg_hash_2[name] | |
+----+----------+-----------+------------------+-----------+
1 row in set (Elapsed: 00:00:00.02)
gbase> explain select * from czg.czg_copy;
+----+----------+-----------+---------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+---------------+-----------+
| 00 | [RESULT] | Table | czg_copy[REP] | |
+----+----------+-----------+---------------+-----------+
1 row in set (Elapsed: 00:00:00.03)
2、单表加单过滤查询
个人理解:第二个HASH分布表的执行计划:
(1)各节点扫描vcname000001.czg.czg_hash表数据。
(2)各节点进行过滤 (a{S} = 1)。
(3)汇总到RESULT操作符发给客户端。
gbase> explain select * from czg.czg where a = 1;
+----+----------+-----------+----------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+----------+------------+
| 00 | [RESULT] | SCAN | czg[DIS] | (a{S} = 1) |
+----+----------+-----------+----------+------------+
1 row in set (Elapsed: 00:00:00.03)
gbase> explain select * from czg.czg_hash where a = 1;
+----+----------+-----------+-------------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+------------+
| 00 | [RESULT] | SCAN | czg_hash[a] | (a{S} = 1) |
+----+----------+-----------+-------------+------------+
1 row in set (Elapsed: 00:00:00.03)
gbase> explain select * from czg.czg_hash_2 where a = 1;
+----+----------+-----------+------------------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+------------------+------------+
| 00 | [RESULT] | SCAN | czg_hash_2[name] | (a{S} = 1) |
+----+----------+-----------+------------------+------------+
1 row in set (Elapsed: 00:00:00.02)
gbase> explain select * from czg.czg_copy where a = 1;
+----+----------+-----------+---------------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+---------------+------------+
| 00 | [RESULT] | SCAN | czg_copy[REP] | (a{S} = 1) |
+----+----------+-----------+---------------+------------+
1 row in set (Elapsed: 00:00:00.02)
可以从CONDITION发现给字段a的数据类型不相符,所以需要做数据转换。
gbase> explain select * from czg.czg_hash where a like '%1%';
+----+----------+-----------+-------------+----------------------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+----------------------------------+
| 00 | [RESULT] | SCAN | czg_hash[a] | (cast(a as char(11)) LIKE '%1%') |
+----+----------+-----------+-------------+----------------------------------+
1 row in set (Elapsed: 00:00:00.03)
3、单表排序
将各个节点的数据扫描出来汇总到一个控制节点,再进行排序返回最终结果。
gbase> explain select * from czg.czg order by a ;
+----+----------+-----------+----------+----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+----------+----------------+
| 01 | [RESULT] | Step | <00> | |
| | | ORDER | | ORDER BY a ASC |
| 00 | [GATHER] | Table | czg[DIS] | |
+----+----------+-----------+----------+----------------+
3 rows in set (Elapsed: 00:00:00.02)
gbase> explain select * from czg.czg_hash order by a ;
+----+----------+-----------+-------------+----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+----------------+
| 01 | [RESULT] | Step | <00> | |
| | | ORDER | | ORDER BY a ASC |
| 00 | [GATHER] | Table | czg_hash[a] | |
+----+----------+-----------+-------------+----------------+
3 rows in set (Elapsed: 00:00:00.02)
gbase> explain select * from czg.czg_hash_2 order by a ;
+----+----------+-----------+------------------+----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+------------------+----------------+
| 01 | [RESULT] | Step | <00> | |
| | | ORDER | | ORDER BY a ASC |
| 00 | [GATHER] | Table | czg_hash_2[name] | |
+----+----------+-----------+------------------+----------------+
3 rows in set (Elapsed: 00:00:00.02)
gbase> explain select * from czg.czg_copy order by a ;
+----+----------+-----------+---------------+----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+---------------+----------------+
| 00 | [RESULT] | Table | czg_copy[REP] | |
| | | ORDER | | ORDER BY a ASC |
+----+----------+-----------+---------------+----------------+
2 rows in set (Elapsed: 00:00:00.02)
4、单表分组
随机分布表czg.czg、HASH分布表(HASH-KEY(name))czg.czg_hash_2在执行计划中我们可以看到,它们是在各个节点根据a进行分组,分组完进行数据重分布到临时表,再进行分组,最后返回结果。
HASH分布表(HASH-KEY(a))czg.czg_hash、复制表czg.czg_copy的数据都是在各自本节点所以不需要重分布,效率较优。
gbase> explain select a from czg.czg group by a ;
+----+-------------+-----------+----------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+-------------+-----------+----------+------------+
| 01 | [RESULT] | Step | <00> | |
| | | GROUP | | GROUP BY a |
| 00 | [REDIST(a)] | Table | czg[DIS] | |
| | | GROUP | | GROUP BY a |
+----+-------------+-----------+----------+------------+
4 rows in set (Elapsed: 00:00:00.02)
gbase> explain select a from czg.czg_hash group by a ;
+----+----------+-----------+-------------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+------------+
| 00 | [RESULT] | Table | czg_hash[a] | |
| | | GROUP | | GROUP BY a |
+----+----------+-----------+-------------+------------+
2 rows in set (Elapsed: 00:00:00.01)
gbase> explain select a from czg.czg_hash_2 group by a ;
+----+-------------+-----------+------------------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+-------------+-----------+------------------+------------+
| 01 | [RESULT] | Step | <00> | |
| | | GROUP | | GROUP BY a |
| 00 | [REDIST(a)] | Table | czg_hash_2[name] | |
| | | GROUP | | GROUP BY a |
+----+-------------+-----------+------------------+------------+
4 rows in set (Elapsed: 00:00:00.02)
gbase> explain select a from czg.czg_copy group by a ;
+----+----------+-----------+---------------+------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+---------------+------------+
| 00 | [RESULT] | Table | czg_copy[REP] | |
| | | GROUP | | GROUP BY a |
+----+----------+-----------+---------------+------------+
2 rows in set (Elapsed: 00:00:00.01)
5、TOPN带排序和有限结果的的聚合
随机分布表czg.czg、HASH分布表(HASH-KEY(name))czg.czg_hash_2在执行计划中我们可以看到,它们是在各个节点根据a进行分组,分组完进行数据重分布到临时表,再进行分组排序limit,之后汇总到一个节点再进行排序limit,最后返回结果。
HASH分布表(HASH-KEY(a))czg.czg_hash、复制表czg.czg_copy的数据都是在各自本节点所以不需要重分布,效率较优。
gbase> explain select a,count(*) as total_num from czg.czg group by a order by total_num desc limit 10;
+----+-------------+-----------+----------+-------------------------------------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+-------------+-----------+----------+-------------------------------------------------+
| 02 | [RESULT] | Step | <01> | |
| | | ORDER | | ORDER BY cast(SUM_2ND(COUNT(0)) as signed) DESC |
| | | LIMIT | | LIMIT 10 |
| 01 | [GATHER] | Step | <00> | |
| | | GROUP | | GROUP BY a |
| | | ORDER | | ORDER BY .. DESC |
| | | LIMIT | | LIMIT 10 |
| 00 | [REDIST(a)] | Table | czg[DIS] | |
| | | GROUP | | GROUP BY a |
+----+-------------+-----------+----------+-------------------------------------------------+
9 rows in set (Elapsed: 00:00:00.01)
gbase> explain select a,count(*) as total_num from czg.czg_hash group by a order by total_num desc limit 10;
+----+----------+-----------+-------------+----------------------------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+----------------------------------------+
| 01 | [RESULT] | Step | <00> | |
| | | ORDER | | ORDER BY cast(COUNT(0) as signed) DESC |
| | | LIMIT | | LIMIT 10 |
| 00 | [GATHER] | Table | czg_hash[a] | |
| | | GROUP | | GROUP BY a |
| | | ORDER | | ORDER BY .. DESC |
| | | LIMIT | | LIMIT 10 |
+----+----------+-----------+-------------+----------------------------------------+
7 rows in set (Elapsed: 00:00:00.01)
gbase> explain select a,count(*) as total_num from czg.czg_hash_2 group by a order by total_num desc limit 10;
+----+-------------+-----------+------------------+-------------------------------------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+-------------+-----------+------------------+-------------------------------------------------+
| 02 | [RESULT] | Step | <01> | |
| | | ORDER | | ORDER BY cast(SUM_2ND(COUNT(0)) as signed) DESC |
| | | LIMIT | | LIMIT 10 |
| 01 | [GATHER] | Step | <00> | |
| | | GROUP | | GROUP BY a |
| | | ORDER | | ORDER BY .. DESC |
| | | LIMIT | | LIMIT 10 |
| 00 | [REDIST(a)] | Table | czg_hash_2[name] | |
| | | GROUP | | GROUP BY a |
+----+-------------+-----------+------------------+-------------------------------------------------+
9 rows in set (Elapsed: 00:00:00.02)
gbase> explain select a,count(*) as total_num from czg.czg_copy group by a order by total_num desc limit 10;
+----+----------+-----------+---------------+------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+---------------+------------------+
| 00 | [RESULT] | Table | czg_copy[REP] | |
| | | GROUP | | GROUP BY a |
| | | ORDER | | ORDER BY .. DESC |
| | | LIMIT | | LIMIT 10 |
+----+----------+-----------+---------------+------------------+
4 rows in set (Elapsed: 00:00:00.01)
6、两表UNION
(1)随机分布表和随机分布表
各个节点的这两个表合并去重后汇总到一个节点,再进行去重,最后返回给前台。
AGG是去重的,因为汇总后,不能保证每个节点之间的数据不重复。
gbase> explain select * from czg.czg union select * from czg.czg;
+----+----------+-----------+----------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+----------+-----------+
| 01 | [RESULT] | Step | <00> | |
| | | AGG | | |
| 00 | [GATHER] | Table | czg[DIS] | |
| | | UNION | | |
| | | Table | czg[DIS] | |
+----+----------+-----------+----------+-----------+
5 rows in set (Elapsed: 00:00:00.02)
(2)随机分布表和复制表
解释同上。
gbase> explain select * from czg.czg union select * from czg.czg_copy;
+----+----------+-----------+---------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+---------------+-----------+
| 01 | [RESULT] | Step | <00> | |
| | | AGG | | |
| 00 | [GATHER] | Table | czg[DIS] | |
| | | UNION | | |
| | | Table | czg_copy[REP] | |
+----+----------+-----------+---------------+-----------+
5 rows in set (Elapsed: 00:00:00.02)
(3)随机分布表和HASH分布表
解释同上。
gbase> explain select * from czg.czg union select * from czg.czg_hash;
+----+----------+-----------+-------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+-----------+
| 01 | [RESULT] | Step | <00> | |
| | | AGG | | |
| 00 | [GATHER] | Table | czg[DIS] | |
| | | UNION | | |
| | | Table | czg_hash[a] | |
+----+----------+-----------+-------------+-----------+
5 rows in set (Elapsed: 00:00:00.01)
gbase> explain select * from czg.czg union select * from czg.czg_hash_2;
+----+----------+-----------+------------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+------------------+-----------+
| 01 | [RESULT] | Step | <00> | |
| | | AGG | | |
| 00 | [GATHER] | Table | czg[DIS] | |
| | | UNION | | |
| | | Table | czg_hash_2[name] | |
+----+----------+-----------+------------------+-----------+
5 rows in set (Elapsed: 00:00:00.02)
(4)HASH分布表和HASH分布表
–1、分布键不相同
解释同上。
gbase> explain select * from czg.czg_hash union select * from czg.czg_hash_2;
+----+----------+-----------+------------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+------------------+-----------+
| 01 | [RESULT] | Step | <00> | |
| | | AGG | | |
| 00 | [GATHER] | Table | czg_hash[a] | |
| | | UNION | | |
| | | Table | czg_hash_2[name] | |
+----+----------+-----------+------------------+-----------+
5 rows in set (Elapsed: 00:00:00.02)
–2、分布键相同
因为分布键相同,重复数据在各自本节点就可以完成去重,所以各节点间不存在重复数据,省掉了汇总去重这一步。
gbase> explain select * from czg.czg_hash union select * from czg.czg_hash;
+----+----------+-----------+-------------+-----------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+-------------+-----------+
| 00 | [RESULT] | Table | czg_hash[a] | |
| | | UNION | | |
| | | Table | czg_hash[a] | |
+----+----------+-----------+-------------+-----------+
3 rows in set (Elapsed: 00:00:00.02)