Week3-3The vector space model

最新推荐文章于 2015-12-22 00:08:18 发布

zypandora

最新推荐文章于 2015-12-22 00:08:18 发布

阅读量280

点赞数

分类专栏： NLP(Michigan)

本文链接：https://blog.csdn.net/zypandora/article/details/49890009

版权

45 篇文章 0 订阅

订阅专栏

Document similarity

Used in IR to determine which document(d1 or d2) is more similar to a given query q(the documents and queries are in the same space)
The angle, or the cosine of the angle is used as a proxy of the similarity of the underlying documents

σ (D, Q) = ( D , Q ) ∣ D ∣ ∣ Q ∣

$\sigma(D,Q) = \frac{(D, Q)}{\mid D \mid \mid Q\mid}$

σ (D, Q) = ∣ D \cap Q ∣ ∣ D \cup Q ∣

$\sigma(D,Q) = \frac{\mid D \cap Q \mid}{\mid D \cup Q\mid}$

D = “cat, dog, dog” = <1,2,0>
Q = “cat, dog, mouse, mouse” = <1,1,2>
similarity

$σ (D, Q) = 1 \times 2 + 2 \times 1 + 0 \times 2 1 2 + 2 2 + 0 2 ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ \sqrt 1 2 + 1 2 + 2 2 ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾ \sqrt = 3 30 ‾ ‾ ‾ \sqrt \approx 0.55$ $\sigma(D,Q) = \frac{ 1\times2 + 2\times1 + 0 \times 2}{\sqrt{1^2 + 2^2 + 0^2}\sqrt{1^2+1^2+2^2}} = \frac{3}{\sqrt{30}} \approx 0.55$