本文以 MySQL 为例。文档比较长,故分为5部分发出:
📝 邻接表(Adjacency List)模型
🎨 路径枚举(Path Enumeration)模型
🧑💻 闭包表(Closure Table)模型
🤹 嵌套集(Nested Set)模型
🧮 性能比较与分析
嵌套集模型
嵌套集(Nested Set)模型的算法也叫做预排序遍历树算法 MPTT(Modified Preorder Tree Taversal)。
在嵌套集的表里我们会有 lft
和 rgt 两个字段,分别用来记录遍历整个集合或整棵树时,一个节点的左右边到根节点左边的距离。
CREATE TABLE nested_category (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL
);
INSERT INTO nested_category VALUES
(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),
(3,'TUBE',3,4),(4,'LCD',5,6),(5,'PLASMA',7,8),
(6,'PORTABLE ELECTRONICS',10,19),(7,'MP3 PLAYERS',11,14),
(8,'FLASH',12,13),(9,'CD PLAYERS',15,16),
(10,'2 WAY RADIOS',17,18);
SELECT * FROM nested_category ORDER BY category_id;
+-------------+----------------------+-----+-----+
| category_id | name | lft | rgt |
+-------------+----------------------+-----+-----+
| 1 | ELECTRONICS | 1 | 20 |
| 2 | TELEVISIONS | 2 | 9 |
| 3 | TUBE | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | PORTABLE ELECTRONICS | 10 | 19 |
| 7 | MP3 PLAYERS | 11 | 14 |
| 8 | FLASH | 12 | 13 |
| 9 | CD PLAYERS | 15 | 16 |
| 10 | 2 WAY RADIOS | 17 | 18 |
+-------------+----------------------+-----+-----+
如果我们用集合的图来看的话,根节点左边初始为1,那么从左向右划过所有子集,碰到集合边缘时计数加1,各个子集(即节点)的左右边的数就一目了然的出来了。
换成用树形结构展示,那么从根节点左边沿着树枝向下走,碰到一个子节点就该子节点左边就加1,若子节点为叶子节点时就转到节点右边向上、向右走,子节点右边也加1,这样遍历整棵树,每个节点左右边的数值也就出来了。
获取嵌套集全树或子树
SELECT node.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+----------------------+
-- 查询子树
SELECT node.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND parent.name = 'PORTABLE ELECTRONICS'
ORDER BY node.lft;
+----------------------+
| name |
+----------------------+
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+----------------------+
只需要查询左右值在父节点左右值之间的记录即可。
获取嵌套集父节点
SELECT parent.name FROM
nested_category AS node,
nested_category AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
AND node.name = 'LCD'
ORDER BY parent.lft DESC
LIMIT 1
+-------------+
| name |
+-------------+
| TELEVISIONS |
+-------------+
子节点的左右值一定在父节点的左右值之间。这里只是获取了直接上级节点,如果去掉 LIMIT 1
,那么此节点的上级节点就都可以获取。
获取嵌套集叶节点
SELECT name
FROM nested_category
WHERE rgt = lft + 1;
+--------------+
| name |
+--------------+
| TUBE |
| LCD |
| PLASMA |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+--------------+
叶子节点的右值只比左值多步长的值(这里是1)。
获取嵌套集完整单路径
SELECT parent.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'FLASH'
ORDER BY parent.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
+----------------------+
可以看到与邻接表相比,它不需要递归。
获取嵌套集节点深度
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY depth;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| ELECTRONICS | 0 |
| PORTABLE ELECTRONICS | 1 |
| TELEVISIONS | 1 |
| 2 WAY RADIOS | 2 |
| CD PLAYERS | 2 |
| MP3 PLAYERS | 2 |
| PLASMA | 2 |
| LCD | 2 |
| TUBE | 2 |
| FLASH | 3 |
+----------------------+-------+
通过按名称归并后对父节点数量进行计数,我们可以获得节点的深度值。我们还可以通过这种方式来具象化的展示节点关系:
SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name | lft |
+------------------------+-----+
| ELECTRONICS | 1 |
| --TELEVISIONS | 2 |
| ----TUBE | 3 |
| ----LCD | 5 |
| ----PLASMA | 7 |
| --PORTABLE ELECTRONICS | 10 |
| ----MP3 PLAYERS | 11 |
| ------FLASH | 12 |
| ----CD PLAYERS | 15 |
| ----2 WAY RADIOS | 17 |
+------------------------+-----+
获取子树深度
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth2 + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node2.name, (COUNT(parent2.name) - 1) AS depth2
FROM nested_category AS node2,
nested_category AS parent2
WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt
AND node2.name = 'PORTABLE ELECTRONICS'
GROUP BY node2.name
ORDER BY depth2
) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name, sub_tree.depth2
ORDER BY depth;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| 2 WAY RADIOS | 1 |
| CD PLAYERS | 1 |
| MP3 PLAYERS | 1 |
| FLASH | 2 |
+----------------------+-------+
这里用了两个自连接实现了获取任意子节点的子树的功能,对根节点也适用。
如果给上面的 SQL 加上 HAVING depth < 2
,我们就可以获得某个节点的所有子节点但不包含更深的孙子节点的结果:
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth2 + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node2.name, (COUNT(parent2.name) - 1) AS depth2
FROM nested_category AS node2,
nested_category AS parent2
WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt
AND node2.name = 'PORTABLE ELECTRONICS'
GROUP BY node2.name
ORDER BY depth2
) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name, sub_tree.depth2
HAVING depth < 2
ORDER BY depth;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| 2 WAY RADIOS | 1 |
| CD PLAYERS | 1 |
| MP3 PLAYERS | 1 |
+----------------------+-------+
这种功能对只展开第一层而不展开后续层次的情况很有用。
以上的 SQL 在 MySQL 8 中也可以用 CTE 语法来写:
WITH sub_tree AS (SELECT node2.name, (COUNT(parent2.name) - 1) AS depth2
FROM nested_category AS node2,
nested_category AS parent2
WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt
AND node2.name = 'PORTABLE ELECTRONICS'
GROUP BY node2.name
ORDER BY depth2)
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth2 + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name, sub_tree.depth2
HAVING depth < 2
ORDER BY depth;
添加节点
嵌套集的节点添加比邻接表复杂不少,因为需要重新计算受影响的节点左右值。这里我们有一个存储过程,它根据父节点 ID 和新节点属性来添加新节点并重新计算受影响的节点左右值。
CREATE DEFINER = `root`@`localhost` PROCEDURE `AddNestedSetNode`(`parent_id` INT,`node_name` VARCHAR(20))
BEGIN
DECLARE _rgt INT;
DECLARE step INT;
SET step = 1;
SET autocommit=0;
IF EXISTS(SELECT category_id From nested_category WHERE category_id = parent_id)
THEN
START TRANSACTION;
SET _rgt = (SELECT rgt FROM nested_category WHERE category_id = parent_id);
UPDATE nested_category SET rgt = rgt + 2 * step WHERE rgt >= _rgt;
UPDATE nested_category SET lft = lft + 2 * step WHERE lft >= _rgt;
INSERT INTO nested_category(name, lft, rgt) values(node_name, _rgt, _rgt + step);
COMMIT;
END IF;
END;
我们尝试添加一个根节点下的子节点:
CALL OrgAndUser.AddNestedSetNode(1,'GAME CONSOLE');
SELECT * FROM nested_category;
+-------------+----------------------+-----+-----+
| category_id | name | lft | rgt |
+-------------+----------------------+-----+-----+
| 1 | ELECTRONICS | 1 | 22 |
| 2 | TELEVISIONS | 2 | 9 |
| 3 | TUBE | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | PORTABLE ELECTRONICS | 10 | 19 |
| 7 | MP3 PLAYERS | 11 | 14 |
| 8 | FLASH | 12 | 13 |
| 9 | CD PLAYERS | 15 | 16 |
| 10 | 2 WAY RADIOS | 17 | 18 |
| 11 | GAME CONSOLE | 20 | 21 |
+-------------+----------------------+-----+-----+
再添加一个 FLASH 的子节点:
CALL OrgAndUser.AddNestedSetNode(8,'ABC FLASH');
SELECT * FROM nested_category;
+-------------+----------------------+-----+-----+
| category_id | name | lft | rgt |
+-------------+----------------------+-----+-----+
| 1 | ELECTRONICS | 1 | 24 |
| 2 | TELEVISIONS | 2 | 9 |
| 3 | TUBE | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | PORTABLE ELECTRONICS | 10 | 21 |
| 7 | MP3 PLAYERS | 11 | 16 |
| 8 | FLASH | 12 | 15 |
| 9 | CD PLAYERS | 17 | 18 |
| 10 | 2 WAY RADIOS | 19 | 20 |
| 11 | GAME CONSOLE | 22 | 23 |
| 12 | ABC FLASH | 13 | 14 |
+-------------+----------------------+-----+-----+
SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name | lft |
+------------------------+-----+
| ELECTRONICS | 1 |
| --TELEVISIONS | 2 |
| ----TUBE | 3 |
| ----LCD | 5 |
| ----PLASMA | 7 |
| --PORTABLE ELECTRONICS | 10 |
| ----MP3 PLAYERS | 11 |
| ------FLASH | 12 |
| --------ABC FLASH | 13 |
| ----CD PLAYERS | 17 |
| ----2 WAY RADIOS | 19 |
| --GAME CONSOLE | 22 |
+------------------------+-----+
删除节点
删除嵌套集的节点与前面其它模型稍有不同,在我们的例子里,假设删除 PORTABLE ELECTRONICS 节点而其它节点不做改动,我们看看会有什么结果:
仅以 lft 和 rgt 数值来说,原来的 PORTABLE ELECTRONICS 节点的子节点会天然的变为 ELECTRONICS 的子节点而不会出现孤立子树:
下面是删除嵌套集叶节点的存储过程,这里重新计算受影响的节点左右值(其实我们知道不重新计算应该也没影响)。
CREATE DEFINER = `root`@`localhost` PROCEDURE `DeleteNestedSetLeaf`(`node_id` INT)
BEGIN
DECLARE _lft INT;
DECLARE _rgt INT;
DECLARE step INT;
DECLARE width INT;
SET step = 1;
SET autocommit=0;
IF EXISTS(SELECT category_id From nested_category WHERE category_id = node_id AND rgt = lft + step)
THEN
START TRANSACTION;
SELECT rgt,lft,(rgt-lft+step) INTO @_rgt,@_lft,@width FROM nested_category WHERE category_id = node_id;
DELETE FROM nested_category WHERE lft BETWEEN @_lft AND @_rgt;
UPDATE nested_category SET rgt = rgt - @width WHERE rgt > @_rgt;
UPDATE nested_category SET lft = lft - @width WHERE lft > @_rgt;
COMMIT;
END IF;
END;
我们删除前面添加的 GAME CONSOLE(ID 为 11):
CALL OrgAndUser.DeleteNestedSetLeaf(11);
SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name | lft |
+------------------------+-----+
| ELECTRONICS | 1 |
| --TELEVISIONS | 2 |
| ----TUBE | 3 |
| ----LCD | 5 |
| ----PLASMA | 7 |
| --PORTABLE ELECTRONICS | 10 |
| ----MP3 PLAYERS | 11 |
| ------FLASH | 12 |
| --------ABC FLASH | 13 |
| ----CD PLAYERS | 17 |
| ----2 WAY RADIOS | 19 |
+------------------------+-----+
删除子树
CREATE DEFINER = `root`@`localhost` PROCEDURE `DeleteNestedSetSubtree`(`node_id` INT)
BEGIN
DECLARE _lft INT;
DECLARE _rgt INT;
DECLARE step INT;
DECLARE width INT;
SET step = 1;
SET autocommit=0;
IF EXISTS(SELECT category_id From nested_category WHERE category_id = node_id)
THEN
START TRANSACTION;
SELECT rgt,lft,(rgt-lft+step) INTO @_rgt,@_lft,@width FROM nested_category WHERE category_id = node_id;
DELETE FROM nested_category WHERE lft BETWEEN @_lft AND @_rgt;
UPDATE nested_category SET rgt = rgt - @width WHERE rgt > @_rgt;
UPDATE nested_category SET lft = lft - @width WHERE lft > @_rgt;
COMMIT;
END IF;
END;
我们删除 FLASH(ID 为8) 及其子树:
CALL OrgAndUser.DeleteNestedSetSubtree(8);
SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name | lft |
+------------------------+-----+
| ELECTRONICS | 1 |
| --TELEVISIONS | 2 |
| ----TUBE | 3 |
| ----LCD | 5 |
| ----PLASMA | 7 |
| --PORTABLE ELECTRONICS | 10 |
| ----MP3 PLAYERS | 11 |
| ----CD PLAYERS | 13 |
| ----2 WAY RADIOS | 15 |
+------------------------+-----+
参考:
Managing Hierarchical Data in MySQL
Joe Celko’s Trees and hierarchies in SQL for smarties (ISBN 1-55860-920-2)
将已有邻接表转换为嵌套集表
我们实际的 OA 数据库里有一个汇报关系表,是一个邻接表模型,它大致的字段如下(为了方便我用的临时表 APPROVAL_GROUP_TEMP
):
CREATE TABLE `APPROVAL_GROUP_TEMP` (
`ID` decimal(8,0) NOT NULL,
`FATHERID` decimal(8,0) DEFAULT NULL, -- 上级ID
`APPROVALGROUPNAME` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -- 名称
`SHOWFLAG` decimal(1,0) DEFAULT '1', -- 状态(1::启用,0:禁用,2:删除)
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
我们构建一个嵌套集表:
CREATE TABLE `AG_Stack` (
`stack_top` int NOT NULL,
`node` varchar(100) NOT NULL, -- 名称
`lft` int DEFAULT NULL,
`rgt` int DEFAULT NULL,
`nodeid` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
然后使用下面的存储过程将邻接表的关系转换到嵌套集表里面(只转换启用的 SHOWFLAG=1
):
CREATE DEFINER=`root`@`localhost` PROCEDURE `AdjToNested`()
DETERMINISTIC
BEGIN
DECLARE lft_rgt INTEGER;
DECLARE max_lft_rgt INTEGER;
DECLARE current_top INTEGER;
DECLARE step INTEGER;
SET step = 1;
SET lft_rgt = 2;
SET max_lft_rgt = 2 * (SELECT COUNT(*) FROM APPROVAL_GROUP_TEMP);
SET current_top = 1;
-- Clear Stack
DELETE FROM AG_Stack;
-- Insert 1st record, push 1 to stack
INSERT INTO AG_Stack
SELECT 1, APPROVALGROUPNAME, 1, max_lft_rgt, id
FROM APPROVAL_GROUP_TEMP
WHERE fatherid = -1;
-- Remove the 1st record from Old table
DELETE FROM APPROVAL_GROUP_TEMP WHERE fatherid = -1;
-- If there are still records
WHILE lft_rgt <= max_lft_rgt - 1 AND current_top > 0 DO
IF EXISTS (SELECT *
FROM AG_Stack AS S1, APPROVAL_GROUP_TEMP AS T1
WHERE S1.nodeid = T1.fatherid AND T1.SHOWFLAG = 1
AND S1.stack_top = current_top)
THEN BEGIN
-- Each time process 1 record
INSERT INTO AG_Stack SELECT (current_top + 1), T1.APPROVALGROUPNAME, lft_rgt, NULL, T1.id
FROM AG_Stack AS S1, APPROVAL_GROUP_TEMP AS T1
WHERE S1.nodeid = T1.fatherid AND T1.SHOWFLAG = 1
AND S1.stack_top = current_top LIMIT 1;
DELETE FROM APPROVAL_GROUP_TEMP
WHERE id = (SELECT nodeid
FROM AG_Stack
WHERE stack_top = (current_top + 1) AND lft = lft_rgt);
SET current_top = current_top + 1;
SET lft_rgt = lft_rgt + step;
END;
ELSEIF current_top >= 0 THEN BEGIN
UPDATE AG_Stack
SET rgt = lft_rgt,
stack_top = - stack_top
WHERE stack_top = current_top;
SET lft_rgt = lft_rgt + step;
SET current_top = current_top - 1;
END;
END IF;
END WHILE;
END;
运行后:
CALL AdjToNested();
SELECT * FROM AG_Stack;
+-----------+---------------------------------+-------+-------+--------+
| stack_top | node | lft | rgt | nodeid |
+-----------+---------------------------------+-------+-------+--------+
| -1 | 公司总部 | 1 | 68202 | 16060 |
| -2 | 研发中心 | 2 | 41052 | 16062 |
| -3 | 直属员工 | 52 | 102 | 16881 |
| -3 | 研发管理部 | 152 | 1702 | 19340 |
| -4 | 直属员工 | 202 | 252 | 19341 |
| -4 | 业务流程管理组 | 302 | 452 | 19720 |
| -5 | 直属员工 | 352 | 402 | 19721 |
| -4 | 业务标准管理组 | 502 | 652 | 19722 |
| -5 | 直属员工 | 552 | 602 | 19723 |
| -4 | 业务执行专家组 | 702 | 1652 | 19724 |
| -5 | 策划专家组 | 752 | 902 | 19342 |
| -6 | 直属员工 | 802 | 852 | 19343 |
| -5 | 程序专家组 | 952 | 1102 | 19344 |
| -6 | 直属员工 | 1002 | 1052 | 19345 |
| -5 | 美术专家组 | 1152 | 1302 | 19346 |
| -6 | 直属员工 | 1202 | 1252 | 19347 |
| -5 | 项目管理专家组 | 1352 | 1502 | 19348 |
| -6 | 直属员工 | 1402 | 1452 | 19349 |
| -5 | 直属员工 | 1552 | 1602 | 19725 |
| -3 | 某某工作室 | 1752 | 3002 | 19496 |
| -4 | XXX项目组 | 1802 | 2952 | 19800 |
| -5 | 策划组 | 1852 | 2002 | 20740 |
-- ...以下省略
附:
树状结构数据的数据库表设计及使用 - 1. 邻接表
树状结构数据的数据库表设计及使用 - 2. 路径枚举(Path Enumeration)模型
树状结构数据的数据库表设计及使用 - 3. 闭包表(Closure Table)模型