这是一个不使用游标,但使用单个递归查询的变体。
本质上,它将数据视为图形中的边并递归遍历图形的所有边,并在检测到循环时停止。然后,将所有找到的循环放入组中,并为每个组分配一个数字。
请参阅下面有关其工作原理的详细说明。我建议您按CTE逐个运行查询,并检查每个中间结果以了解其作用。
样品1
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(2, 'b', 'b'),
(3, 'c', 'a'),
(4, 'c', 'b'),
(5, 'c', 'c');
样品2
我又添加了一行带有z值的行,以使多行具有未配对的值。
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(1, 'a', 'c'),
(2, 'b', 'f'),
(3, 'a', 'g'),
(4, 'c', 'h'),
(5, 'b', 'j'),
(6, 'd', 'f'),
(7, 'e', 'k'),
(8, 'i', NULL),
(88, 'z', 'z'),
(9, 'l', 'h');
样品3
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'f'),
(2, 'a', 'g'),
(3, 'a', NULL),
(4, 'b', 'c'),
(5, 'b', 'a'),
(6, 'b', 'h'),
(7, 'b', 'j'),
(8, 'b', NULL),
(9, 'b', NULL),
(10, 'b', 'g'),
(11, 'c', 'k'),
(12, 'c', 'b'),
(13, 'd', 'l'),
(14, 'd', 'f'),
(15, 'd', 'g'),
(16, 'd', 'm'),
(17, 'd', 'a'),
(18, 'd', NULL),
(19, 'd', 'a'),
(20, 'e', 'c'),
(21, 'e', 'b'),
(22, 'e', NULL);
询问
WITH
CTE_Idents
AS
(
SELECT Ident1 AS Ident
FROM @T
UNION
SELECT Ident2 AS Ident
FROM @T
)
,CTE_Pairs
AS
(
SELECT Ident1, Ident2
FROM @T
WHERE Ident1 <> Ident2
UNION
SELECT Ident2 AS Ident1, Ident1 AS Ident2
FROM @T
WHERE Ident1 <> Ident2
)
,CTE_Recursive
AS
(
SELECT
CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent
, Ident1
, Ident2
, CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1
UNION ALL
SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult
UNION
SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
SELECT
CTE_Idents.Ident
,CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
,DENSE_RANK() OVER(ORDER BY
CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
) AS GroupID
FROM
CTE_Idents
CROSS APPLY
(
SELECT CTE_CleanResult.Ident+','
FROM CTE_CleanResult
WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
) AS CA_XML(XML_Value)
CROSS APPLY
(
SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
) AS CA_Data(XML_Value)
WHERE
CTE_Idents.Ident IS NOT NULL
ORDER BY Ident;
结果1
+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a | a,b,c, | 1 |
| b | a,b,c, | 1 |
| c | a,b,c, | 1 |
+-------+--------------+---------+
结果2
+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a | a,c,g,h,l, | 1 |
| b | b,d,f,j, | 2 |
| c | a,c,g,h,l, | 1 |
| d | b,d,f,j, | 2 |
| e | e,k, | 3 |
| f | b,d,f,j, | 2 |
| g | a,c,g,h,l, | 1 |
| h | a,c,g,h,l, | 1 |
| i | i | 4 |
| j | b,d,f,j, | 2 |
| k | e,k, | 3 |
| l | a,c,g,h,l, | 1 |
| z | z | 5 |
+-------+--------------+---------+
结果3
+-------+--------------------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------------------+---------+
| a | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| b | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| c | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| d | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| e | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| f | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| g | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| h | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| j | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| k | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| l | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| m | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
+-------+--------------------------+---------+
怎么运行的
我将使用第二组示例数据进行解释。
CTE_Idents
CTE_Idents给出同时出现在Ident1和Ident2列中的所有标识符的列表。由于它们可以按任何顺序出现,因此我们将UNION这两列一起列出。UNION也会删除所有重复项。
+-------+
| Ident |
+-------+
| NULL |
| a |
| b |
| c |
| d |
| e |
| f |
| g |
| h |
| i |
| j |
| k |
| l |
| z |
+-------+
CTE_Pairs
CTE_Pairs给出两个方向上图形所有边缘的列表。同样,UNION用于删除任何重复项。
+--------+--------+
| Ident1 | Ident2 |
+--------+--------+
| a | c |
| a | g |
| b | f |
| b | j |
| c | a |
| c | h |
| d | f |
| e | k |
| f | b |
| f | d |
| g | a |
| h | c |
| h | l |
| j | b |
| k | e |
| l | h |
+--------+--------+
CTE_Recursive
CTE_Recursive是查询的主要部分,它从每个唯一的标识符开始递归遍历图。这些起始行由的第一部分产生UNION ALL。的第二部分以UNION ALL递归方式连接到本身,并链接Ident2到Ident1。由于我们预先制作CTE_Pairs了沿两个方向书写的所有边线,因此我们始终只能链接Ident2到,Ident1并在图中获得所有路径。同时,查询将生成IdentPath-到目前为止已遍历的字符串,以逗号分隔。它用于WHERE过滤器中:
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
一旦我们遇到之前包含在Path中的Identifier,随着连接节点列表的耗尽,递归就会停止。 AnchorIdent是递归的起始标识符,以后将用于对结果进行分组。 Lvl并没有真正使用过,我将其包括在内是为了更好地了解发生了什么。
+-------------+--------+--------+-------------+-----+
| AnchorIdent | Ident1 | Ident2 | IdentPath | Lvl |
+-------------+--------+--------+-------------+-----+
| a | a | c | ,a,c, | 1 |
| a | a | g | ,a,g, | 1 |
| b | b | f | ,b,f, | 1 |
| b | b | j | ,b,j, | 1 |
| c | c | a | ,c,a, | 1 |
| c | c | h | ,c,h, | 1 |
| d | d | f | ,d,f, | 1 |
| e | e | k | ,e,k, | 1 |
| f | f | b | ,f,b, | 1 |
| f | f | d | ,f,d, | 1 |
| g | g | a | ,g,a, | 1 |
| h | h | c | ,h,c, | 1 |
| h | h | l | ,h,l, | 1 |
| j | j | b | ,j,b, | 1 |
| k | k | e | ,k,e, | 1 |
| l | l | h | ,l,h, | 1 |
| l | h | c | ,l,h,c, | 2 |
| l | c | a | ,l,h,c,a, | 3 |
| l | a | g | ,l,h,c,a,g, | 4 |
| j | b | f | ,j,b,f, | 2 |
| j | f | d | ,j,b,f,d, | 3 |
| h | c | a | ,h,c,a, | 2 |
| h | a | g | ,h,c,a,g, | 3 |
| g | a | c | ,g,a,c, | 2 |
| g | c | h | ,g,a,c,h, | 3 |
| g | h | l | ,g,a,c,h,l, | 4 |
| f | b | j | ,f,b,j, | 2 |
| d | f | b | ,d,f,b, | 2 |
| d | b | j | ,d,f,b,j, | 3 |
| c | h | l | ,c,h,l, | 2 |
| c | a | g | ,c,a,g, | 2 |
| b | f | d | ,b,f,d, | 2 |
| a | c | h | ,a,c,h, | 2 |
| a | h | l | ,a,c,h,l, | 3 |
+-------------+--------+--------+-------------+-----+
CTE_CleanResult
CTE_CleanResult只留下来自的相关部分,CTE_Recursive然后再次合并Ident1和Ident2使用UNION。
+-------------+-------+
| AnchorIdent | Ident |
+-------------+-------+
| a | a |
| a | c |
| a | g |
| a | h |
| a | l |
| b | b |
| b | d |
| b | f |
| b | j |
| c | a |
| c | c |
| c | g |
| c | h |
| c | l |
| d | b |
| d | d |
| d | f |
| d | j |
| e | e |
| e | k |
| f | b |
| f | d |
| f | f |
| f | j |
| g | a |
| g | c |
| g | g |
| g | h |
| g | l |
| h | a |
| h | c |
| h | g |
| h | h |
| h | l |
| j | b |
| j | d |
| j | f |
| j | j |
| k | e |
| k | k |
| l | a |
| l | c |
| l | g |
| l | h |
| l | l |
+-------------+-------+
最终选择
现在,我们需要Ident为每个构建一个用逗号分隔的字符串AnchorIdent。 CROSS APPLY用FOR XML做它。 DENSE_RANK()计算GroupID每个的数字AnchorIdent。