关闭

Week3-3The vector space model

108人阅读 评论(0) 收藏 举报
分类:

Document similarity

  • Used in IR to determine which document(d1 or d2) is more similar to a given query q(the documents and queries are in the same space)
  • The angle, or the cosine of the angle is used as a proxy of the similarity of the underlying documents
    这里写图片描述

Cosine similarity

σ(D,Q)=(D,Q)DQ

A variant:Jaccard coeffecient

σ(D,Q)=DQDQ

Example

  • D = “cat, dog, dog” = <1,2,0>
  • Q = “cat, dog, mouse, mouse” = <1,1,2>

  • similarity

    σ(D,Q)=1×2+2×1+0×212+22+0212+12+22=3300.55

Distributional similarity

  • Two words that appear in similar contexts are likely to be semantically related

You will know the word by the company that it keeps.

The context

这里写图片描述

0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:9453次
    • 积分:960
    • 等级:
    • 排名:千里之外
    • 原创:89篇
    • 转载:0篇
    • 译文:2篇
    • 评论:0条