# elasticsearch评分所用到的算法

102 篇文章 0 订阅
48 篇文章 2 订阅

## 1、算法介绍

relevance /ˈreləvəns/ score算法，简单来说，就是计算出，一个索引中的文本，与搜索文本，他们之间的关联匹配程度

Elasticsearch使用的是 term frequency /ˈfriːkwənsi/ /inverse document frequency算法，简称为TF/IDF算法

Term frequency：搜索文本中的各个词条在field文本中出现了多少次，出现次数越多，就越相关

doc1：hello you, and world is very good
doc2：hello, how are you
doc1 中 满足 hello，word 俩个词条， doc2中仅满足hello 所以doc1越相关。分数越高

Inverse document frequency：搜索文本中的各个词条在整个索引的所有文档中出现了多少次，出现的次数越多，就越不相关

doc1：hello, today is very good
doc2：hi world, how are you

doc2更相关

Field-length norm：field长度，field越长，相关度越弱

doc1：{ “title”: “hello article”, “content”: “babaaba 1万个单词” }
doc2：{ “title”: “my article”, “content”: “blablabala 1万个单词，hi world” }

hello world在整个index中出现的次数是一样多的

doc1更相关，title field更短

## elasticsearch中通过TF&IDF算法算出相应的评分信息

#### 通过explain=true 可以查看满足搜索条件的词条详细得分情况

GET raven_index/_search?explain=true
{
"query": {
"match": {
}
}
}

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.9424875,
"hits" : [
{
"_shard" : "[raven_index][0]",
"_node" : "rdTRZzVlQwKe0JWnYKyylA",
"_index" : "raven_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.9424875,
"_source" : {
"age" : 18,
"name" : "王宝强"
},
"_explanation" : {
"value" : 2.9424875,
"description" : "sum of:",
"details" : [
{
"value" : 0.9808292,
"description" : "weight(address:陕西 in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.9808292,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.98082924,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.9808292,
"description" : "weight(address:西西 in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.9808292,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.98082924,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.9808292,
"description" : "weight(address:西安 in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.9808292,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.98082924,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
},
{
"_shard" : "[raven_index][0]",
"_node" : "rdTRZzVlQwKe0JWnYKyylA",
"_index" : "raven_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.135697,
"_source" : {
"age" : 20,
"name" : "王祖蓝"
},
"_explanation" : {
"value" : 1.135697,
"description" : "sum of:",
"details" : [
{
"value" : 1.135697,
"description" : "weight(address:中国 in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 1.135697,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.98082924,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.5263158,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
},
{
"_shard" : "[raven_index][0]",
"_node" : "rdTRZzVlQwKe0JWnYKyylA",
"_index" : "raven_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.86312973,
"_source" : {
"age" : 30,
"name" : "王祖贤"
},
"_explanation" : {
"value" : 0.86312973,
"description" : "sum of:",
"details" : [
{
"value" : 0.86312973,
"description" : "weight(address:西南 in 2) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.86312973,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.98082924,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.4,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
}
]
}
}

## 3、分析一个document是如何被匹配上的

#### es 5版本 则通过 get 索引名/类型/id/_explain进行查看

GET raven_index/_explain/1
{
"query": {
"match": {
}
}
}

GET /test_index/test_type/6/_explain
{
"query": {
"match": {
"test_field": "test hello"
}
}
}
• 0
点赞
• 3
收藏
觉得还不错? 一键收藏
• 0
评论
04-22 1654
09-11 764
01-17 3187
09-20 3471
02-02 3万+
08-10 983
04-10 3202
10-07 439
03-14 7056

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。