01 TF&IDF概念
TF
Term frequency:搜索文本中的各个词条在field文本中出现了多少次,出现次数越多,就越相关
term在一个doc中出现的次数,出现的次数越多,分数越高
IDF
Inverse document frequencry:搜索文本中的各个词条出现了多少次,出现的次数越多,越不相关
term在所有的doc中出现的次数,出现的次数越多,分数越低
length Norm
term搜索的那个Field的长度,长度越长,相关度越低,分数越低;长度越短,分数越高
最后结合TF,IDF,length Norm综合评分,得到该term对doc的最终分数
如何计算score
GET /website/article/1/_explain{ "query": { "match": { "title": "title" } }}
{ "_index" : "website", "_type" : "article", "_id" : "1", "matched" : true, "explanation" : { "value" : 0.2876821, "description" : "weight(title:title in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.2876821, "description" : "score(doc=0,freq=1.0 = termFreq=1.0), product of:", "details" : [ { "value" : 0.2876821, "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:", "details" : [ { "value" : 1.0, "description" : "docFreq", "details" : [ ] }, { "value" : 1.0, "description" : "docCount", "details" : [ ] } ] }, { "value" : 1.0, "description" : "tfNorm, computed as (freq * (k1 +