Week3-3The vector space model
- Used in IR to determine which document(d1 or d2) is more similar to a given query q(the documents and queries are in the same space)
- The angle, or the cosine of the angle is used as a proxy of the similarity of the underlying documents
A variant:Jaccard coeffecient
- D = “cat, dog, dog” = <1,2,0>
Q = “cat, dog, mouse, mouse” = <1,1,2>
- Two words that appear in similar contexts are likely to be semantically related
You will know the word by the company that it keeps.