本文以 MySQL 为例。文档比较长,故分为5部分发出:
📝 邻接表(Adjacency List)模型
🎨 路径枚举(Path Enumeration)模型
🧑💻 闭包表(Closure Table)模型
🤹 嵌套集(Nested Set)模型
🧮 性能比较与分析
附:
树状结构数据的数据库表设计及使用 - 1. 邻接表
树状结构数据的数据库表设计及使用 - 2. 路径枚举(Path Enumeration)模型
树状结构数据的数据库表设计及使用 - 3. 闭包表(Closure Table)模型
树状结构数据的数据库表设计及使用 - 4. 嵌套集(Nested Set)模型
性能比较
这里比较一下不同模型类似数据量(大概600多条)下的性能。
优化前
这里除了闭包表的关联表都未加索引。
获取全树:
SET @@profiling = 0;
SET @@profiling_history_size = 0;
SET @@profiling_history_size = 100;
SET @@profiling = 1;
-- 邻接表
WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (
SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE
T0.FATHERID = -1 AND SHOWFLAG=1
UNION ALL
SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1
WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1
)
SELECT * FROM T1;
-- 路径枚举表
SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%';
-- 闭包表
SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0;
-- 嵌套集
SELECT node.node,node.nodeid
FROM AG_Stack AS node,
AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
AND parent.lft = 1
ORDER BY node.lft;
-- 查看性能
SHOW PROFILES;
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 0.01553425 | WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS ( SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE T0.FATHERID = -1 AND SHOWFLAG=1 UNION ALL SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1 WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1 ) SELECT |
| 2 | 0.00199475 | SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%' |
| 3 | 0.01929400 | SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0 |
| 4 | 0.00121350 | SELECT node.node,node.nodeid FROM AG_Stack AS node, AG_Stack AS parent WHERE node.lft > parent.lft AND node.rgt < parent.rgt AND parent.lft = 1 ORDER BY node.lft |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
可以看到嵌套集的查询速度对比邻接表是数量级上的快,600多条数据快了10倍;相较闭包表,嵌套集的速度也是快上6倍左右。而在没有优化的情况下,嵌套集与路径枚举表的速度差不多。
Explain分析
用 explain 语句分析一下四条 SQL,路径枚举表只用了1次简单查询,嵌套集2次,闭包表3次,邻接表4次。而且邻接表包含1个复杂查询、一个临时表的子查询和2个联合查询,其它模型都是简单查询(SIMPLE)。因为邻接表、路径枚举表和嵌套集都没有加索引,所以 type 都是 ALL 即全表扫描。
-- 邻接表
EXPLAIN
WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (
SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE
T0.FATHERID = -1 AND SHOWFLAG=1
UNION ALL
SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1
WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1
)
SELECT * FROM T1;
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 616 | 100.00 | NULL |
| 2 | DERIVED | T0 | NULL | ALL | NULL | NULL | NULL | NULL | 2467 | 1.00 | Using where |
| 3 | UNION | T1 | NULL | ALL | NULL | NULL | NULL | NULL | 24 | 100.00 | Recursive |
| 3 | UNION | T2 | NULL | ALL | NULL | NULL | NULL | NULL | 2467 | 1.00 | Using where; Using join buffer (hash join) |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
-- 路径枚举表
EXPLAIN
SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%';
+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | AG_PathEnum | NULL | ALL | NULL | NULL | NULL | NULL | 683 | 11.11 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
-- 闭包表
EXPLAIN
SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0;
+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+
| 1 | SIMPLE | n1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 2467 | 10.00 | Using where |
| 1 | SIMPLE | n2 | NULL | ref | uniq_anc_desc | uniq_anc_desc | 4 | OrgAndUser.n1.ID | 5 | 90.00 | Using index condition; Using where |
| 1 | SIMPLE | n3 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | OrgAndUser.n2.descendant | 1 | 100.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+
-- 嵌套集
EXPLAIN
SELECT node.node,node.nodeid
FROM AG_Stack AS node,
AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
AND parent.lft = 1
ORDER BY node.lft;
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
| 1 | SIMPLE | parent | NULL | ALL | NULL | NULL | NULL | NULL | 683 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | node | NULL | ALL | NULL | NULL | NULL | NULL | 683 | 11.11 | Using where; Using join buffer (hash join) |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
优化后
这里对几个模型分别加上检索字段的索引:
-- 邻接表
CREATE UNIQUE INDEX APPROVAL_GROUP_TEMP_ID_IDX USING BTREE ON APPROVAL_GROUP_TEMP (ID);
CREATE INDEX APPROVAL_GROUP_TEMP_FATHERID_IDX USING BTREE ON APPROVAL_GROUP_TEMP (FATHERID);
CREATE INDEX APPROVAL_GROUP_TEMP_SHOWFLAG_IDX USING BTREE ON APPROVAL_GROUP_TEMP (SHOWFLAG);
-- 路径枚举
CREATE UNIQUE INDEX AG_PathEnum_path_string_IDX USING BTREE ON AG_PathEnum (path_string);
-- 闭包表
CREATE INDEX AG_Closure_ancestor_IDX USING BTREE ON AG_Closure (ancestor);
CREATE INDEX AG_Closure_descendant_IDX USING BTREE ON AG_Closure (descendant);
CREATE INDEX AG_Closure_distance_IDX USING BTREE ON AG_Closure (distance);
-- 嵌套集
CREATE UNIQUE INDEX AG_Stack_lft_IDX USING BTREE ON AG_Stack (lft);
CREATE UNIQUE INDEX AG_Stack_rgt_IDX USING BTREE ON AG_Stack (rgt);
重新执行前面的全树查询,SHOW PROFILES
的结果是:
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 12 | 0.00990250 | WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (
SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE
T0.FATHERID = -1 AND SHOWFLAG=1
UNION ALL
SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1
WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1
)
SELECT |
| 13 | 0.00184200 | SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%' |
| 14 | 0.00384525 | SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0 |
| 15 | 0.00235000 | SELECT node.node,node.nodeid
FROM AG_Stack AS node,
AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
AND parent.lft = 1
ORDER BY node.lft |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
看起来路径枚举与嵌套集差距不大,嵌套集反而比未优化前慢了一倍,这个应该与数据库的缓存及构建索引有关,再次运行同样的 SQL 就更快了。
我们再比较一下寻找父节点的性能:
-- 路径枚举
SELECT P2.* FROM AG_PathEnum AS P1, AG_PathEnum AS P2
WHERE P1.nodeid = 18903 AND POSITION(P2.path_string IN P1.path_string)= 1 ORDER BY P2.nodeid;
-- 嵌套集
SELECT parent.node,parent.nodeid FROM
AG_Stack AS node,
AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
AND node.nodeid = 18903
ORDER BY parent.lft DESC;
SHOW PROFILES;
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 19 | 0.00192450 | SELECT P2.* FROM AG_PathEnum AS P1, AG_PathEnum AS P2
WHERE P1.nodeid = 18903 AND POSITION(P2.path_string IN P1.path_string)= 1 ORDER BY P2.nodeid |
| 20 | 0.00111425 | SELECT parent.node,parent.nodeid FROM
AG_Stack AS node,
AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
AND node.nodeid = 18903
ORDER BY parent.lft DESC |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
差距也不大。
总结
树状或层次结构的数据在数据库里的设计,以上的四种模型为久经考验的常用模型。
-
是否归一化:邻接表和路径枚举表因为可能出现回环路径,所以不是归一化的模型,需要通过编程实现归一化;
-
是否无限深度:路径枚举表的路径长度有限,对一些深度需求很高的数据模型就不合适;
-
查询是否需要递归:邻接表只能通过递归来实现全树搜索,其它模型则不需要
-
需要几个表:闭包表需要额外的关系表,即通过空间换时间
-
增删改是否简单:邻接表只需要对一条记录操作即可,路径枚举表和闭包表都需要对关联的节点进行操作,嵌套集也需要重新计算修改节点的后续节点左右值
-
性能:如上测试,邻接表性能因为递归的原因最差,路径枚举和嵌套集因为简单查询及优化后性能都很好,闭包表查询的表多、次数较多,故性能一般
附:
树状结构数据的数据库表设计及使用 - 1. 邻接表
树状结构数据的数据库表设计及使用 - 2. 路径枚举(Path Enumeration)模型
树状结构数据的数据库表设计及使用 - 3. 闭包表(Closure Table)模型
树状结构数据的数据库表设计及使用 - 4. 嵌套集(Nested Set)模型
转: