现在很多网站都提供了站内的搜索功能,有的很简单在SQL语句里加一个条件如:where names like ‘%words%’就可以实现最基本的搜索了。
我们来看看功能强大一点,复杂一点的搜索是如何实现的(在SQL SERVER200/2005通过存储过程实现搜索算法)。
我们把用户的搜索可以分为以下两种:
1.精确搜索,就是把用户输入的各个词语当成一个整体,不分割搜索.
2.像百度,GOOGLE一样的,按空格把输入的每一个词分离,只要包含这些词语,而不管出现的顺序,称为ALL-Word Search.
3.对输入的词只要有一个出现就为匹配 称为Any-Word Search
一、对搜索结果进行排序的算法
在前面提到的LIKE语句最大的问题就是搜索的结果是没有经过排序的,我们不知道结果出现在的顺序是如何的,因为它是随机的。像百度,GOOGLE都会对结果用算法进行排序再显示的.好我们也来建立一个简单的排序法。一个很常见的算法是计算关键词在被搜索内容中出现的次数,次数最多的排在结果的第一位。我们的是在存储过程中实现这个算法的,而在SQLSERVER中没有提供计算关键词在被搜索内容中出现的次数这样的函数,我们要自己写一个UDF(User-Defined Functions),UDF是SQLSERVER的内部函数,可以被存储过程调用或者被其他UDF调用。函数如下:
1
CREATE
FUNCTION
dbo.WordCount
2
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
3
(
@Word
VARCHAR
(
15
),
4
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
5
@Phrase
VARCHAR
(
1000
))
6
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
7
RETURNS
SMALLINT
8
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
9
AS
10
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
11
BEGIN
12
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
13
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* 如果@Word 或者@Phrase 为空返回 0 */
14
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
15
IF
@Word
IS
NULL
OR
@Phrase
IS
NULL
RETURN
0
16
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
17
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* @BiggerWord 比@Word长一个字符 */
18
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
19
DECLARE
@BiggerWord
VARCHAR
(
21
)
20
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
21
SELECT
@BiggerWord
=
@Word
+
'
x
'
22
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
23
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/*在 @Phrase用@BiggerWord替换@Word */
24
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
25
DECLARE
@BiggerPhrase
VARCHAR
(
2000
)
26
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
27
SELECT
@BiggerPhrase
=
REPLACE
(
@Phrase
,
@Word
,
@BiggerWord
)
28
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
29
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* 相减结果就是出现的次数了 */
30
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
31
RETURN
LEN
(
@BiggerPhrase
)
-
LEN
(
@Phrase
)
32
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
33
END
34
以上就是整个UDF,它用了一个很高效的方法来计算关键词出现的次数。
二、参数传递
用户输入的关键词从一个到多个不等,我们可以把参数固定为@word1~@word5,这样比较方面实现。当用户输入超过5个时,忽略不计,少于5个的地方视为空。其实GOOGLE也是这样做的,只是GOOGLE的最大词语限制是10个。
三、搜索的实现过程
假定我们对Product表进行搜索,Product字段有:Id,Name ,Descripton(产品描述),搜索要同时对Name 和 Description进行。
Any-World Search实现如下:
1
SELECT
Product.Name,
2
3
*
WordCount(
@Word1
, Name)
+
WordCount(
@Word1
, Description)
+
3
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
4
3
*
WordCount(
@Word2
, Name)
+
WordCount(
@Word2
, Description)
+
5
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
6
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
7
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
8
AS
Rank
9
FROM
Product
10
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
11
这里对Name赋予权重为3,Description为1(大家根据实际情况赋予不同的权重),Rank是计算列,通过前面定义的UDF计算所关键词出现的次数乘上权重等到的。
同样的All-Word Search实现如下:
1
SELECT
Product.Name,
2
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
3
(
3
*
WordCount(
@Word1
, Name)
+
WordCount(
@Word1
, Description))
*
4
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
5
CASE
6
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
7
WHEN
@Word2
IS
NULL
THEN
1
8
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
9
ELSE
3
*
WordCount(
@Word2
, Name)
+
WordCount(
@Word2
, Description)
10
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
11
END
*
12
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
13
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
14
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
15
AS
Rank
16
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
17
FROM
Product
18
这时把每个关键词出现的次数相乘只要一个没出现
RANK
就为
0
,为
0
就是搜索结果为空。
还可以这样实现:
1
SELECT
Product.Name,
2
CASE
3
WHEN
@Word1
IS
NULL
THEN
0
4
ELSE
ISNULL
(
NULLIF
(dbo.WordCount(
@Word1
, Name
+
'
'
+
Description),
0
),
-
1000
)
5
END
+
6
CASE
7
WHEN
@Word2
IS
NULL
THEN
0
8
ELSE
ISNULL
(
NULLIF
(dbo.WordCount(
@Word2
, Name
+
'
'
+
Description),
0
),
-
1000
)
9
END
+
10
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
11
AS
Rank
12
FROM
Product
对没出现的关键词赋值-1000,这样Rank就肯定为负数,负数表示搜索结果为空。
四、对结果进行分页
搜索的结果可能很多,对结果分页可以提高性能。我在如何在数据层分页以提高性能已经说明了如何用存储过程进行分页了,这里就不在详细复述了。
过程简单来说就是创建一个临时表,表中包含行号,读取时按行号来读取数据
五、完整代码
经过前面的分析,完整代码如下:
1
CREATE
PROCEDURE
SearchCatalog
2
(
3
@PageNumber
TINYINT
,
4
@ProductsPerPage
TINYINT
,
5
@HowManyResults
SMALLINT
OUTPUT,
6
@AllWords
BIT
,
7
@Word1
VARCHAR
(
15
)
=
NULL
,
8
@Word2
VARCHAR
(
15
)
=
NULL
,
9
@Word3
VARCHAR
(
15
)
=
NULL
,
10
@Word4
VARCHAR
(
15
)
=
NULL
,
11
@Word5
VARCHAR
(
15
)
=
NULL
)
12
AS
13
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* 创建临时表,保存搜索的结果(Sql Server2005适用,Sql Server2000见如何在数据层分页以提高性能) */
14
DECLARE
@Products
TABLE
15
(RowNumber
SMALLINT
IDENTITY
(
1
,
1
)
NOT
NULL
,
16
ID
INT
,
17
Name
VARCHAR
(
50
),
18
Description
VARCHAR
(
1000
),
19
Rank
INT
)
20
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
21
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* Any-words search */
22
IF
@AllWords
=
0
23
INSERT
INTO
@Products
24
SELECT
ID, Name, Description,
25
3
*
dbo.WordCount(
@Word1
, Name)
+
dbo.WordCount(
@Word1
, Description)
+
26
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
27
3
*
dbo.WordCount(
@Word2
, Name)
+
dbo.WordCount(
@Word2
, Description)
+
28
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
29
3
*
dbo.WordCount(
@Word3
, Name)
+
dbo.WordCount(
@Word3
, Description)
+
30
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
31
3
*
dbo.WordCount(
@Word4
, Name)
+
dbo.WordCount(
@Word4
, Description)
+
32
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
33
3
*
dbo.WordCount(
@Word5
, Name)
+
dbo.WordCount(
@Word5
, Description)
34
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
35
AS
Rank
36
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
37
FROM
Product
38
ORDER
BY
Rank
DESC
39
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
40
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* all-words search */
41
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
42
IF
@AllWords
=
1
43
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
44
INSERT
INTO
@Products
45
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
46
SELECT
ID, Name, Description,
47
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
48
(
3
*
dbo.WordCount(
@Word1
, Name)
+
dbo.WordCount
49
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
50
(
@Word1
, Description))
*
51
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
52
CASE
53
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
54
WHEN
@Word2
IS
NULL
THEN
1
55
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
56
ELSE
3
*
dbo.WordCount(
@Word2
, Name)
+
dbo.WordCount(
@Word2
,
57
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
58
Description)
59
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
60
END
*
61
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
62
CASE
63
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
64
WHEN
@Word3
IS
NULL
THEN
1
65
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
66
ELSE
3
*
dbo.WordCount(
@Word3
, Name)
+
dbo.WordCount(
@Word3
,
67
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
68
Description)
69
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
70
END
*
71
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
72
CASE
73
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
74
WHEN
@Word4
IS
NULL
THEN
1
75
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
76
ELSE
3
*
dbo.WordCount(
@Word4
, Name)
+
dbo.WordCount(
@Word4
,
77
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
78
Description)
79
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
80
END
*
81
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
82
CASE
83
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
84
WHEN
@Word5
IS
NULL
THEN
1
85
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
86
ELSE
3
*
dbo.WordCount(
@Word5
, Name)
+
dbo.WordCount(
@Word5
,
87
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
88
Description)
89
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
90
END
91
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
92
AS
Rank
93
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
94
FROM
Product
95
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
96
ORDER
BY
Rank
DESC
97
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
98
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* 在外部变量保存搜索结果数 */
99
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
100
SELECT
@HowManyResults
=
COUNT
(
*
)
101
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
102
FROM
@Products
103
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
104
WHERE
Rank
>
0
105
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
106
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
/**/
/* 按页返回结果*/
107
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
108
SELECT
ProductID, Name, Description, Price, Image1FileName,
109
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
110
Image2FileName, Rank
111
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
112
FROM
@Products
113
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
114
WHERE
Rank
>
0
115
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
116
AND
RowNumber
BETWEEN
(
@PageNumber
-
1
)
*
@ProductsPerPage
+
1
117
![None.gif](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
118
AND
@PageNumber
*
@ProductsPerPage
119
ORDER
BY
Rank
DESC
至此一个简单的搜索算法就实现了。