树状结构数据的数据库表设计及使用 - 5. 四种模型的性能比较与分析

本文以 MySQL 为例。文档比较长,故分为5部分发出:

📝 邻接表(Adjacency List)模型

🎨 路径枚举(Path Enumeration)模型

🧑‍💻 闭包表(Closure Table)模型

🤹 嵌套集(Nested Set)模型

🧮 性能比较与分析

附:
树状结构数据的数据库表设计及使用 - 1. 邻接表
树状结构数据的数据库表设计及使用 - 2. 路径枚举(Path Enumeration)模型
树状结构数据的数据库表设计及使用 - 3. 闭包表(Closure Table)模型
树状结构数据的数据库表设计及使用 - 4. 嵌套集(Nested Set)模型

性能比较

这里比较一下不同模型类似数据量(大概600多条)下的性能。

优化前

这里除了闭包表的关联表都未加索引。

获取全树:
SET @@profiling = 0;

SET @@profiling_history_size = 0;

SET @@profiling_history_size = 100; 

SET @@profiling = 1;



-- 邻接表

WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (

SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE

    T0.FATHERID = -1 AND SHOWFLAG=1

UNION ALL

SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1

  WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1

)

SELECT * FROM T1;



-- 路径枚举表

SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%';



-- 闭包表

SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1

INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor

INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID

WHERE n1.FATHERID = -1 AND n2.distance != 0;



-- 嵌套集

SELECT node.node,node.nodeid

FROM AG_Stack AS node,

        AG_Stack AS parent

WHERE node.lft > parent.lft AND node.rgt < parent.rgt

        AND parent.lft = 1

ORDER BY node.lft;



-- 查看性能

SHOW PROFILES;

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|        1 | 0.01553425 | WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS ( SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE T0.FATHERID = -1 AND SHOWFLAG=1 UNION ALL SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1 WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1 ) SELECT  |

|        2 | 0.00199475 | SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%' |

|        3 | 0.01929400 | SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1

INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor

INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID

WHERE n1.FATHERID = -1 AND n2.distance != 0 |

|        4 | 0.00121350 | SELECT node.node,node.nodeid FROM AG_Stack AS node, AG_Stack AS parent WHERE node.lft > parent.lft AND node.rgt < parent.rgt AND parent.lft = 1 ORDER BY node.lft                                                                                                                            |

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

可以看到嵌套集的查询速度对比邻接表是数量级上的快,600多条数据快了10倍;相较闭包表,嵌套集的速度也是快上6倍左右。而在没有优化的情况下,嵌套集与路径枚举表的速度差不多。

Explain分析

用 explain 语句分析一下四条 SQL,路径枚举表只用了1次简单查询,嵌套集2次,闭包表3次,邻接表4次。而且邻接表包含1个复杂查询、一个临时表的子查询和2个联合查询,其它模型都是简单查询(SIMPLE)。因为邻接表、路径枚举表和嵌套集都没有加索引,所以 type 都是 ALL 即全表扫描。

-- 邻接表

EXPLAIN

WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (

SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE

    T0.FATHERID = -1 AND SHOWFLAG=1

UNION ALL

SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1

  WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1

)

SELECT * FROM T1;

+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+

| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |

+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+

|  1 | PRIMARY     | <derived2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  616 |   100.00 | NULL                                       |

|  2 | DERIVED     | T0         | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2467 |     1.00 | Using where                                |

|  3 | UNION       | T1         | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   24 |   100.00 | Recursive                                  |

|  3 | UNION       | T2         | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2467 |     1.00 | Using where; Using join buffer (hash join) |

+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+



-- 路径枚举表

EXPLAIN

SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%';

+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+

| id | select_type | table       | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |

+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+

|  1 | SIMPLE      | AG_PathEnum | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  683 |    11.11 | Using where |

+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+



-- 闭包表

EXPLAIN

SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1

INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor

INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID

WHERE n1.FATHERID = -1 AND n2.distance != 0;

+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+

| id | select_type | table | partitions | type   | possible_keys | key           | key_len | ref                      | rows | filtered | Extra                              |

+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+

|  1 | SIMPLE      | n1    | NULL       | ALL    | PRIMARY       | NULL          | NULL    | NULL                     | 2467 |    10.00 | Using where                        |

|  1 | SIMPLE      | n2    | NULL       | ref    | uniq_anc_desc | uniq_anc_desc | 4       | OrgAndUser.n1.ID         |    5 |    90.00 | Using index condition; Using where |

|  1 | SIMPLE      | n3    | NULL       | eq_ref | PRIMARY       | PRIMARY       | 4       | OrgAndUser.n2.descendant |    1 |   100.00 | Using where                        |

+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+



-- 嵌套集

EXPLAIN

SELECT node.node,node.nodeid

FROM AG_Stack AS node,

        AG_Stack AS parent

WHERE node.lft > parent.lft AND node.rgt < parent.rgt

        AND parent.lft = 1

ORDER BY node.lft;

+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+

| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                        |

+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+

|  1 | SIMPLE      | parent | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  683 |    10.00 | Using where; Using temporary; Using filesort |

|  1 | SIMPLE      | node   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  683 |    11.11 | Using where; Using join buffer (hash join)   |

+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+


优化后

这里对几个模型分别加上检索字段的索引:

-- 邻接表

CREATE UNIQUE INDEX APPROVAL_GROUP_TEMP_ID_IDX USING BTREE ON APPROVAL_GROUP_TEMP (ID);

CREATE INDEX APPROVAL_GROUP_TEMP_FATHERID_IDX USING BTREE ON APPROVAL_GROUP_TEMP (FATHERID);

CREATE INDEX APPROVAL_GROUP_TEMP_SHOWFLAG_IDX USING BTREE ON APPROVAL_GROUP_TEMP (SHOWFLAG);



-- 路径枚举

CREATE UNIQUE INDEX AG_PathEnum_path_string_IDX USING BTREE ON AG_PathEnum (path_string);



-- 闭包表

CREATE INDEX AG_Closure_ancestor_IDX USING BTREE ON AG_Closure (ancestor);

CREATE INDEX AG_Closure_descendant_IDX USING BTREE ON AG_Closure (descendant);

CREATE INDEX AG_Closure_distance_IDX USING BTREE ON AG_Closure (distance);



-- 嵌套集

CREATE UNIQUE INDEX AG_Stack_lft_IDX USING BTREE ON AG_Stack (lft);

CREATE UNIQUE INDEX AG_Stack_rgt_IDX USING BTREE ON AG_Stack (rgt);

重新执行前面的全树查询,SHOW PROFILES
的结果是:

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|       12 | 0.00990250 | WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (

SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE

    T0.FATHERID = -1 AND SHOWFLAG=1

UNION ALL

SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1

  WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1

)

SELECT  |

|       13 | 0.00184200 | SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%'     |

|       14 | 0.00384525 | SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1

INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor

INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID

WHERE n1.FATHERID = -1 AND n2.distance != 0                                                                                                   |

|       15 | 0.00235000 | SELECT node.node,node.nodeid

FROM AG_Stack AS node,

        AG_Stack AS parent

WHERE node.lft > parent.lft AND node.rgt < parent.rgt

        AND parent.lft = 1

ORDER BY node.lft                                                                                                                            |

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

看起来路径枚举与嵌套集差距不大,嵌套集反而比未优化前慢了一倍,这个应该与数据库的缓存及构建索引有关,再次运行同样的 SQL 就更快了。

我们再比较一下寻找父节点的性能:
-- 路径枚举

SELECT P2.* FROM AG_PathEnum AS P1, AG_PathEnum AS P2 

WHERE P1.nodeid = 18903 AND POSITION(P2.path_string IN P1.path_string)= 1 ORDER BY P2.nodeid;



-- 嵌套集

SELECT parent.node,parent.nodeid FROM

  AG_Stack AS node,

  AG_Stack AS parent

WHERE node.lft > parent.lft AND node.rgt < parent.rgt

  AND node.nodeid = 18903

ORDER BY parent.lft DESC;



SHOW PROFILES;

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|       19 | 0.00192450 | SELECT P2.* FROM AG_PathEnum AS P1, AG_PathEnum AS P2 

WHERE P1.nodeid = 18903 AND POSITION(P2.path_string IN P1.path_string)= 1 ORDER BY P2.nodeid                                                                                                                                                           |

|       20 | 0.00111425 | SELECT parent.node,parent.nodeid FROM

AG_Stack AS node,

AG_Stack AS parent

WHERE node.lft > parent.lft AND node.rgt < parent.rgt

AND node.nodeid = 18903

ORDER BY parent.lft DESC                                                                                                                            |

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


差距也不大。

总结

树状或层次结构的数据在数据库里的设计,以上的四种模型为久经考验的常用模型。
在这里插入图片描述在这里插入图片描述

  • 是否归一化:邻接表和路径枚举表因为可能出现回环路径,所以不是归一化的模型,需要通过编程实现归一化;

  • 是否无限深度:路径枚举表的路径长度有限,对一些深度需求很高的数据模型就不合适;

  • 查询是否需要递归:邻接表只能通过递归来实现全树搜索,其它模型则不需要

  • 需要几个表:闭包表需要额外的关系表,即通过空间换时间

  • 增删改是否简单:邻接表只需要对一条记录操作即可,路径枚举表和闭包表都需要对关联的节点进行操作,嵌套集也需要重新计算修改节点的后续节点左右值

  • 性能:如上测试,邻接表性能因为递归的原因最差,路径枚举和嵌套集因为简单查询及优化后性能都很好,闭包表查询的表多、次数较多,故性能一般

附:
树状结构数据的数据库表设计及使用 - 1. 邻接表
树状结构数据的数据库表设计及使用 - 2. 路径枚举(Path Enumeration)模型
树状结构数据的数据库表设计及使用 - 3. 闭包表(Closure Table)模型
树状结构数据的数据库表设计及使用 - 4. 嵌套集(Nested Set)模型

转:

树状结构数据的数据库表设计及使用 - 5. 四种模型的性能比较与分析

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值