#准备
/PUT {{host}}:{{port}}/demo
{
"mappings":{
"article":{
"properties":{
"content":{
"type":"text"
}
}
}
}
}
#导入数据
[
{
"content": "测试语句1"
},
{
"content": "测试语句2"
},
{
"content": "测试语句3,字段长度不同"
}
]
#查询
/POST {{host}}:{{port}}/demo/article/_search
{
"query":{
"match":{
"content":"测"
}
}
}
#测试结果:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.2824934,
"hits": [
{
"_index": "demo",
"_type": "article",
"_id": "AWEIQ90700f4t28Wzjdj",
"_score": 0.2824934,
"_source": {
"content": "测试语句2"
}
},
{
"_index": "demo",
"_type": "article",
"_id": "AWEIQ71f00f4t28WzjZT",
"_score": 0.21247853,
"_source": {
"content": "测试语句1"
}
},
{
"_index": "demo",
"_type": "article",
"_id": "AWEIRAEw00f4t28Wzjkd",
"_score": 0.1293895,
"_source": {
"content": "测试语句3,字段长度不同"
}
}
]
}
}
奇怪的是,按照语句1和语句2的分数居然不同!因为他们两个文档的关键参数,词频,字段长度,逆向文档频率均相同,为什么算出来的分不同呢?
原因主要是因为 每个分片会根据 该分片内的所有文档计算一个本地 IDF 。而文档落在不同的分片就会导致逆向文档频率不同,算出来的分数也不同。
当文档数量比较大,分片分布均匀后,这个问题基本不会影响很大。那么在我们这个demo中使用添加
?search_type=dfs_query_then_fetch
来查询所有的idf
。
/POST {{host}}:{{port}}/demo/article/_search?search_type=dfs_query_then_fetch
{
"query":{
"match":{
"content":"测"
}
}
}
#测试结果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.14899126,
"hits": [
{
"_index": "demo",
"_type": "article",
"_id": "AWEIQ71f00f4t28WzjZT",
"_score": 0.14899126,
"_source": {
"content": "测试语句1"
}
},
{
"_index": "demo",
"_type": "article",
"_id": "AWEIQ90700f4t28Wzjdj",
"_score": 0.14899126,
"_source": {
"content": "测试语句2"
}
},
{
"_index": "demo",
"_type": "article",
"_id": "AWEIRAEw00f4t28Wzjkd",
"_score": 0.087505676,
"_source": {
"content": "测试语句3,字段长度不同"
}
}
]
}
}
可以看到,评分如我们所想得,文档1和2分数相同,而文档3因为长度更长,导致分数更低。
继续测试查询时权重的影响
/POST {{host}}:{{port}}/demo/article/_search?search_type=dfs_query_then_fetch
{
"query": {
"bool": {
"should": [
{
"match": {
"content": {
"query": "1",
"boost": 2
}
}
},
{
"match": {
"content": "2"
}
}
]
}
}
}
#测试结果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.1887734,
"hits": [
{
"_index": "demo",
"_type": "article",
"_id": "AWEIQ71f00f4t28WzjZT",
"_score": 2.1887734,
"_source": {
"content": "测试语句1"
}
},
{
"_index": "demo",
"_type": "article",
"_id": "AWEIQ90700f4t28Wzjdj",
"_score": 1.0943867,
"_source": {
"content": "测试语句2"
}
}
]
}
}
可以看到,由于给予搜索关键字1更高的权重,因此文档1的分数比文档2分数要高,具体细节可以通过?explain查看。
其他更改评分的方法
- 按受欢迎度提升权重
- 过滤集提升权重
- 随机评分
- 越近越好
- 脚本评分