头一次发现线性代数在解线性方程组以外的应用,即线性代数在信息检索领域中的应用,在此记录一下。
假设数据库中有以下书籍。
B1. Applied Linear Algebra
B2. Elementary Linear Algebra
B3. Elementary Linear Algebra with Applications
B4. Linear Algebra and Its Applications
B5. Linear Algebra with Applications
B6. Matrix Algebra with Applications
B7. Matrix Theory
书名中出现的关键字(搜索时由用户输入)有:algebra,application,elementary,linear,matrix 和theory。
现在,要将每本书与关键字的匹配关系记录到一张表格(矩阵)中,如下所示:
Books | |||||||
Key Words | B1 | B2 | B3 | B4 | B5 | B6 | B7 |
algebra | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
application | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
elementary | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
linear | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
matrix | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
theory | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
1表示书名中包含对应的关键字,0表示不包含,即不匹配。
假设用户输入的关键字为algebra,application 和 linear,为了利用矩阵运算,将关键字保存为1维向量(7行1列矩阵,第一行到第七行分别对应B1到B7),如下所示:
Search Keywords |
1 |
1 |
0 |
1 |
0 |
0 |
接下来,如何计算(统计)每本书与关键字列表的合计匹配次数呢?
很简单,只需要将Books表转置后与Search Kewords表进行矩阵乘法即可,如下所示:
transposed Books table | ||||||
Key Words | algebra | application | elementary | linear | matrix | theory |
B1 | 1 | 1 | 0 | 1 | 0 | 0 |
B2 | 1 | 0 | 1 | 1 | 0 | 0 |
B3 | 1 | 1 | 1 | 1 | 0 | 0 |
B4 | 1 | 1 | 0 | 1 | 0 | 0 |
B5 | 1 | 1 | 0 | 1 | 0 | 0 |
B6 | 1 | 1 | 0 | 0 | 1 | 0 |
B7 | 0 | 0 | 0 | 0 | 1 | 1 |
Hits |
3 |
2 |
3 |
3 |
3 |
2 |
0 |
Hits列即为各书与关键字的合计匹配次数(使用MMULT函数进行数组计算方式得到),即B1, B3, B4, B5中包含全部3个搜索关键字(algebra,application 和 linear)。