Remember word net
Given 2 words, we could calculate the number of the links between these 2 words in the word net forest(tree). The great the distance, the smaller the similarity.
Path similarity
- Version 1
- Sim(v, w) = -pathlength(v, w)
- Version 2
- Sim(v, w) = -log( pathlength(v, w) )
Problems of the approach
- There may be no tree for the specific domain or language
- A specific word(a term) may be not in any tree
- IS-A(hypernym) edges are not equally apart in similarity space
Advanced version
Version 3
Sim(v,w)=−logP(LCS(v,w))
LCS = lowest common subsumer
e.g. ungulate for deer and horse
deer for deer and elkVersion 4