水平有限,如有错误请指正!
tf-idf weighting
tf(term frequency)
a document or zone that mentions a query term more often has more to do with that query and therefore should receive a higher score
query term
: a set of words
N
:document中term的总数
tft=tN
idf(inverse document frequency)
N
: document的数量
从公式可以看出:
- dft 越小, idft 越大,表明 t 对文档的区分性更大
- 反之