1.算法介绍
relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索的文本,他们之间的关联匹配程序
ElasticSearch使用的是term frequency/inverse document frequency算法,简称TF/IDF算法
Term frequency,搜索文本中的各个词条在field文本中出现了多少次,出现次数越多,就越相关
Inverse document frequency:搜索文本中的各个词条在整个索引的所有文档中出现多少次,出现的次数越多,就越不相关
Field-length norm:field长度,field越长,相关度越弱
1.1 TF例子:搜索请求hello world
doc1:hello you,and world is very good
doc2:hello you,how are you
显然,搜索的请求在doc1中出现了两次,在doc2中出现了一次,doc1更相关
1.2 IDF例子:搜索请求hello world
doc1:hello,today is very good -- hello
doc2:hi world,how are you -- world
比如说,在index中有一万条document,hello这个单词在所有的document中,一共出现了1000次,word这个单词在所有的document中一共出现了100次,那么显然doc2更相关,因为world在所有document中出现的次数更少,越不相关
1.3 Field-length norm:field长度,field越长,相关度越弱
搜索请求:hello world
doc1:{"title":"hello article","contennt":"balabala 一万个单词"}
doc2: {"title":"my article","content":"balabala 一万个单词 ,hi world"}
hello world在整个index中出现的次数是一样多的
doc1的field:(hello出现在title中,world出现在content中),field更短,所以相关度更强
2._score是如何被计算出来的?
• 执行一次请求
GET /test_index/test_type/_search
{
"query":{
"match":{
"test_field": "test world"
}
}
}
• 执行结果:
{
"took": 36,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1.4867525,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "6",
"_score": 1.4867525,
"_source": {
"test_field": "test world test"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "3",
"_score": 1.0754046,
"_source": {
"test_field": "test world"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "5",
"_score": 0.9471576,
"_source": {
"test_field": "test 001"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 0.7126236,
"_source": {
"test_field": "test hello001"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": 0.45203948,
"_source": {
"test_field": "world xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}
]
}
}
• 我们可以查看一下如何计算出的_score
GET /test_index/test_type/_search?explain
{
"query":{
"match":{
"test_field": "test world"
}
}
}
• 执行结果,可以看到每个document的_score怎么计算出来的
{
"took": 294,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1.4867525,
"hits": [
{
"_shard": "[test_index][2]",
"_node": "5JcZFTo8TMGAcBR5psWKmg",
"_index": "test_index",
"_type": "test_type",
"_id": "6",
"_score": 1.4867525,
"_source": {
"test_field": "test world test"
},
"_explanation": {
"value": 1.4867526,
"description": "sum of:",
"details": [
{
"value": 1.4867526,
"description": "sum of:",
"details": [
{
"value": 1.1230313,
"description": "weight(test_field:test in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 1.1230313,
"description": "score(doc=0,freq=2.0 = termFreq=2.0\n), product of:",
"details": [
{
"value": 0.98082924,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 3,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.1449814,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 2,
"description": "termFreq=2.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 2.3333333,
"description": "avgFieldLength",
"details": []
},
{
"value": 4,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 0.36372137,
"description": "weight(test_field:world in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.36372137,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.47000363,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 2,
"description": "docFreq",
"details": []
},
{
"value": 3,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.7738693,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 2.3333333,
"description": "avgFieldLength",
"details": []
},
{
"value": 4,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "*:*, product of:",
"details": [
{
"value": 1,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
}
]
}
]
}
},
{
"_shard": "[test_index][4]",
"_node": "5JcZFTo8TMGAcBR5psWKmg",
"_index": "test_index",
"_type": "test_type",
"_id": "3",
"_score": 1.0754046,
"_source": {
"test_field": "test world"
},
"_explanation": {
"value": 1.0754046,
"description": "sum of:",
"details": [
{
"value": 1.0754046,
"description": "sum of:",
"details": [
{
"value": 0.5377023,
"description": "weight(test_field:test in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.5377023,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.6931472,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 2,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.7757405,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1.5,
"description": "avgFieldLength",
"details": []
},
{
"value": 2.56,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 0.5377023,
"description": "weight(test_field:world in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.5377023,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.6931472,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 2,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.7757405,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1.5,
"description": "avgFieldLength",
"details": []
},
{
"value": 2.56,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "*:*, product of:",
"details": [
{
"value": 1,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
}
]
}
]
}
},
{
"_shard": "[test_index][1]",
"_node": "5JcZFTo8TMGAcBR5psWKmg",
"_index": "test_index",
"_type": "test_type",
"_id": "5",
"_score": 0.9471576,
"_source": {
"test_field": "test 001"
},
"_explanation": {
"value": 0.9471576,
"description": "sum of:",
"details": [
{
"value": 0.9471576,
"description": "sum of:",
"details": [
{
"value": 0.9471576,
"description": "weight(test_field:test in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.9471576,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 1.3862944,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 5,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.6832298,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1.2,
"description": "avgFieldLength",
"details": []
},
{
"value": 2.56,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "*:*, product of:",
"details": [
{
"value": 1,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
}
]
}
]
}
},
{
"_shard": "[test_index][3]",
"_node": "5JcZFTo8TMGAcBR5psWKmg",
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 0.7126236,
"_source": {
"test_field": "test hello001"
},
"_explanation": {
"value": 0.71262366,
"description": "sum of:",
"details": [
{
"value": 0.71262366,
"description": "sum of:",
"details": [
{
"value": 0.71262366,
"description": "weight(test_field:test in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.71262366,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.98082924,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 3,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.7265522,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1.3333334,
"description": "avgFieldLength",
"details": []
},
{
"value": 2.56,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "_type:test_type, product of:",
"details": [
{
"value": 1,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
}
]
}
]
}
},
{
"_shard": "[test_index][2]",
"_node": "5JcZFTo8TMGAcBR5psWKmg",
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": 0.45203948,
"_source": {
"test_field": "world xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"_explanation": {
"value": 0.45203945,
"description": "sum of:",
"details": [
{
"value": 0.45203945,
"description": "sum of:",
"details": [
{
"value": 0.45203945,
"description": "weight(test_field:world in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.45203945,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.47000363,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 2,
"description": "docFreq",
"details": []
},
{
"value": 3,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.96177864,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 2.3333333,
"description": "avgFieldLength",
"details": []
},
{
"value": 2.56,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "*:*, product of:",
"details": [
{
"value": 1,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
}
]
}
]
}
}
]
}
}
说明:拿第一个document分析
"_explanation": {
"value": 1.0754046,
"description": "sum of:",
"details": [
{
"value": 1.0754046,
"description": "sum of:",
"details": [
{
"value": 0.5377023,
"description": "weight(test_field:test in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.5377023,
注释:termFreq:term出现的频率为1
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.6931472,
注释:idf 计算公式
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 2,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.7757405,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1.5,
"description": "avgFieldLength",
"details": []
},
{
"value": 2.56,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 0.5377023,
"description": "weight(test_field:world in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.5377023,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.6931472,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 2,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.7757405,
注释:tfNorm
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1.5,
"description": "avgFieldLength",
"details": []
},
{
"value": 2.56,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "*:*, product of:",
"details": [
{
"value": 1,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
}
]
}
]
}
• 也可以查询一条数据,执行explain
GET /test_index/test_type/1/_explain
{
"query":{
"match":{
"test_field": "test world"
}
}
}