昨天一朋友问我一个问题,在将一张表里指定的编号(2-4个),按照名称统计编号的数量,仅有一个编号数据的不显示。
模拟表结构如下:
CREATE
TABLE
TEMP
(
T_ID INT ,
T_NAME NVARCHAR ( 5 )
)
插入一些数据:
(
T_ID INT ,
T_NAME NVARCHAR ( 5 )
)
INSERT
TEMP
SELECT
1
,
'
A
'
UNION ALL SELECT 2 , ' B '
UNION ALL SELECT 3 , ' C '
UNION ALL SELECT 4 , ' D '
UNION ALL SELECT 1 , ' C '
UNION ALL SELECT 4 , ' B '
UNION ALL SELECT 4 , ' C '
UNION ALL SELECT 2 , ' A '
实际应得到的数据为:
UNION ALL SELECT 2 , ' B '
UNION ALL SELECT 3 , ' C '
UNION ALL SELECT 4 , ' D '
UNION ALL SELECT 1 , ' C '
UNION ALL SELECT 4 , ' B '
UNION ALL SELECT 4 , ' C '
UNION ALL SELECT 2 , ' A '
T_ID1 T_ID2 T_ID3 T_ID4 T_NAME
-- --------- ----------- ----------- ----------- ------
1 1 0 0 A
0 1 0 1 B
1 0 1 1 C
应为编号是给出的,那么得想办法得到符合条件的T_NAME,这样的话需要根据 T_ID和T_NAME分组数据然后将有多个编号的数据过滤出来,这里需要将结果集自连一下,变量表示不可以进行JOIN操作,临时表的话我不大想在这里用,所幸SQL SERVER 2005 提供了一个新的语法 CTE(COMMON TABLE EXPRESSION)公共表表达式,不用创建临时表,并且可以进行JOIN操作,遗憾的是必须紧跟着使用,在后面的语句就不能用了.用CTE实现这个功能:
-- --------- ----------- ----------- ----------- ------
1 1 0 0 A
0 1 0 1 B
1 0 1 1 C
WITH
_
TEMP
AS
(
SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN ( 1 , 2 , 3 , 4 ) GROUP BY T_ID,T_NAME
)
SELECT DISTINCT _ TEMP .T_NAME FROM _ TEMP JOIN _ TEMP TEMP_TEMP
ON _ TEMP .T_NAME = TEMP_TEMP.T_NAME WHERE _ TEMP .T_ID <> TEMP_TEMP.T_ID
查询的数据为:
(
SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN ( 1 , 2 , 3 , 4 ) GROUP BY T_ID,T_NAME
)
SELECT DISTINCT _ TEMP .T_NAME FROM _ TEMP JOIN _ TEMP TEMP_TEMP
ON _ TEMP .T_NAME = TEMP_TEMP.T_NAME WHERE _ TEMP .T_ID <> TEMP_TEMP.T_ID
T_NAME
-- ----
A
B
C
这样符合条件的T_NAME数据就取出来了,现在定义一个变量,将这个数据组合起来作为条件去取出符合条件的数据:
-- ----
A
B
C
DECLARE
@NAMES
NVARCHAR
(
20
)
SET @NAMES = '' ;
WITH _ TEMP AS
(
SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN ( 1 , 2 , 3 , 4 ) GROUP BY T_ID,T_NAME
)
SELECT @NAMES = @NAMES + '''' + _ TEMP .T_NAME + ''' , ' FROM _ TEMP JOIN _ TEMP TEMP_TEMP
ON _ TEMP .T_NAME = TEMP_TEMP.T_NAME WHERE _ TEMP .T_ID <> TEMP_TEMP.T_ID
SET @NAMES = LEFT ( @NAMES , LEN ( @NAMES ) - 1 )
EXEC ( ' SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN (1,2,3,4) AND T_NAME IN ( ' + @NAMES + ' ) ' )
-- 查询出的数据为:
T_ID T_NAME
-- --------- ------
1 A
2 B
3 C
1 C
4 B
4 C
2 A
现在需要做的事情就是按T_NAME统计T_ID并实现行列转换,在SQL 2005之前的版本可能需要些CASE语块,如果列数不确定的话就更麻烦了,现在SQL 2005提供了
PIVOT 运算符来实现行列转换,完整的SQL语句:
SET @NAMES = '' ;
WITH _ TEMP AS
(
SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN ( 1 , 2 , 3 , 4 ) GROUP BY T_ID,T_NAME
)
SELECT @NAMES = @NAMES + '''' + _ TEMP .T_NAME + ''' , ' FROM _ TEMP JOIN _ TEMP TEMP_TEMP
ON _ TEMP .T_NAME = TEMP_TEMP.T_NAME WHERE _ TEMP .T_ID <> TEMP_TEMP.T_ID
SET @NAMES = LEFT ( @NAMES , LEN ( @NAMES ) - 1 )
EXEC ( ' SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN (1,2,3,4) AND T_NAME IN ( ' + @NAMES + ' ) ' )
-- 查询出的数据为:
T_ID T_NAME
-- --------- ------
1 A
2 B
3 C
1 C
4 B
4 C
2 A
DECLARE
@NAMES
NVARCHAR
(
20
)
SET @NAMES = '' ;
WITH _ TEMP AS
(
SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN ( 1 , 2 , 3 , 4 ) GROUP BY T_ID,T_NAME
)
SELECT @NAMES = @NAMES + '''' + _ TEMP .T_NAME + ''' , ' FROM _ TEMP JOIN _ TEMP TEMP_TEMP
ON _ TEMP .T_NAME = TEMP_TEMP.T_NAME WHERE _ TEMP .T_ID <> TEMP_TEMP.T_ID
SET @NAMES = LEFT ( @NAMES , LEN ( @NAMES ) - 1 )
DECLARE @EXEC_SQL VARCHAR ( 1000 )
SET @EXEC_SQL = ( ' SELECT [1][T_ID1],[2][T_ID2],[3][T_ID3],[4][T_ID4],T_NAME FROM
(SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN (1,2,3,4) AND T_NAME IN ( ' + @NAMES + ' ))G
PIVOT
(
COUNT(T_ID)
FOR T_ID IN ([1],[2],[3],[4])
)P ' )
EXEC ( @EXEC_SQL )
因为符合条件的T_NAME不会很多,所以这里使用 IN 运算符,并用SQL 2005 的一些新特性实现要求。
SET @NAMES = '' ;
WITH _ TEMP AS
(
SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN ( 1 , 2 , 3 , 4 ) GROUP BY T_ID,T_NAME
)
SELECT @NAMES = @NAMES + '''' + _ TEMP .T_NAME + ''' , ' FROM _ TEMP JOIN _ TEMP TEMP_TEMP
ON _ TEMP .T_NAME = TEMP_TEMP.T_NAME WHERE _ TEMP .T_ID <> TEMP_TEMP.T_ID
SET @NAMES = LEFT ( @NAMES , LEN ( @NAMES ) - 1 )
DECLARE @EXEC_SQL VARCHAR ( 1000 )
SET @EXEC_SQL = ( ' SELECT [1][T_ID1],[2][T_ID2],[3][T_ID3],[4][T_ID4],T_NAME FROM
(SELECT T_ID,T_NAME FROM TEMP WHERE T_ID IN (1,2,3,4) AND T_NAME IN ( ' + @NAMES + ' ))G
PIVOT
(
COUNT(T_ID)
FOR T_ID IN ([1],[2],[3],[4])
)P ' )
EXEC ( @EXEC_SQL )
发送过去后,在真实环境(一百二十多万条数据)执行了下,速度还是很快的。