Latent semantic Indexing(LSI)

 

Because of the tremendous diversity in the words people use to describe the same document,lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI).  because the subspace represents important associative relationships between terms and documents that are not evident in individual documents.

 

 

LSI assumes that there is some underlying or latent structure in word usage that is partially obscured by variability in word choice.

 

 

SVD 奇异值分解:

 

Given an m*n matrix , where without loss of generality m>=n and rank(A)=r, the singular value decomposition of A, denoted by SVD(A), is defined as

A=UΣVT

where UTU=VTV=I

and Σ=diag(σ1,...,σn),

σi>0 for 1<=i<=r, 

σj=0 for j>=r+1.

 

The fi rst columns of the orthogonal matrices and define the orthonormal eigenvectors associated with the nonzero eigenvalues of AAT and ATA, respectively. The columns of and are referred to as the left and right singular vectors, resp   ectively, and the singular values of A are defined as the diagonal elements of which are the nonnegative square roots of the n

eigenvalues of AAT.

 

 

Interpretation of SVD components within LSI.

 

Ak=Best rank-k approximation to A.

U=term vectors

Σ=Singular values

V=Document Vectors

m=Number of terms

n=Number of documents

k=Number of factors

r=rank of A

 

the user query can be represented by 

q^=qTUkΣk-1


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值