elasticsearch（7） elasticsearch相关度规则、scroll、bouncing result mapping root object /自定义dynamic策略

本文链接：https://blog.csdn.net/m0_37139189/article/details/83113472

1.TF 单词频率

一个string field中单词出现的次数越多，相关度越高

2.IDF

整个索引中这个单词出现的个数越多，相关度越低

3.FL

string field长度越长，相关度越低

查看相关度相关分数计算可以用如下命令

get ecommerce/producer/3/_explain
{
"query":{
"match": {
"producer":"producer"
}
}
}
4.

bouncing results发生原因和解决方法

preference
决定了哪些shard会被用来执行搜索操作
_primary（只搜索primary shard）, _primary_first（primary shard优先）, _local, _only_node:xyz, _prefer_node:xyz, _shards:2,3
bouncing results问题：同样的搜索条件，搜出的结果不一样，原因是因为主从复制的不是实时性产生的
解决方案是将preference设置成prmary，让所有请求打到pramary shard上

2.timeout:主要是限定在一定时间内，将部分获取到的数据直接返回，避免查询耗时过长
3.routing：document文档路由，默认_id路由。可以指定路由：routing=user_id，这样的话可以让同一个user对应的数据到同一个shard上去
4.search_type：默认query_then_fetch
如果设置成dfs_query_then_fetch，可以提升revelance sort的精准度

scroll技术

如果一次性要查出很多数据，比如10万条，那么性能就会很差，此时一般采用scoll技术，一批一批查，直到查完所有数据

例子：

get /index/type/_search?scroll=1m(指定时间)

{"query":{

"match_all":{},

"sort":["doc"],

"size":3

}

此时会返回一个scroll_id

然后再用这个scroll_id接着搜索，可以接下去搜

scroll看起来很像分页查询。但是底层却很不一样

type底层技术

lucence底层没有type概念，es是把type当做一个field存

put /index1/type1/3
{
"name":"zhangsan",
"age":2
}
put /index2/type2/4
{
"name":"lisi",
"sex":"male"
}

这两个动作在底层就是

"_type":"type1"

"name":"zhangsan",
"age":2,

"sex"：""

"_type":"type2"

"name":"lisi",
"sex":"male",

"age":""

这就解释了为什么结构类似的文档要放在一个index下，

1.root object是什么？
就是某个type对应的mapping json，包括properties，metadata（_id，_source，_type），settings(analyzer),其他settings（比如include_in_all)

其中，下面的{
            "properties":{
            }
            ...
        }
        这部分就是mapping的root object对象

PUT /my_index
{
    "mappings":{
        "my_type":{
            "properties":{
            }
            ...
        }
    }
}

下面来讲解一下root object有什么？
2.properties
配置type里面的document有哪些field：field的type（类型），index（是否进行分词），analyzer（分词器）

PUT /my_index/my_type/_mapping
{
    "my_type":{
        "properties":{
            "title":{
                "type":"text",
                "index":"analyzed",
                "analyzed":"standard"
            }
        }
    }
}

例子：

PUT /index0/my_type/_mapping
{
"properties": {
"title":{
"type": "text",
"index": true,
"analyzer": "standard"
}
}
}

3._source
就是我们在保存document，最原始的json的文本，就是作为document的_source的值。
好处：
1）查询的时候，直接可以拿到完整的document，不需要先拿document id，再发送一次请求拿document
2）partial update基于_source实现
3）reindex时，直接基于_source实现，不需要从数据（或者其他外部存储）查询数据再修改，后面学习零停机重建索引
4）可以基于_source定制返回field
5）debug query更容易，因为可以直接看到_source

如果你不需要上述的好处，可以禁用_source
PUT /my_index
{
    "mappings":{
        "my_type":{
            "_source":{
                "enabled":false
            }
        }
    }
}

例子：如果是已经存在的type，是无法再修改enabled的属性的。
修改my_type2成功
PUT /index0/my_type2/_mapping
{
"_all": {"enabled":false}
}

修改以及存在的my_type失败
PUT /index0/_mapping/my_type
{
  "_all": {"enabled":false}
}
执行结果：
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "mapper [_all] enabled is true now encountering false"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "mapper [_all] enabled is true now encountering false"
  },
  "status": 400
}

4._all
将所有field打包在一起，作为一个_all field建立索引。如果没有指定任何field进行搜索时，就是使用_all field搜索。

如果你不需要，可以禁用
PUT /index0/_mapping/my_type
{
    "_all":{
        "enabled":false
    }
}

也可以在field级别设置include_in_all field，设置是否要将field的值包含在_all field中

PUT /my_index/my_type/_mapping
{
"my_field":{
}
}

例子：
PUT /index0/_mapping/my_type2
{
  "properties": {
    "title":{
      "type": "text",
      "include_in_all": false
    }
  }
}

同样的，不能对已经存在的type进行操作，必须是新的type，上面的type的title是存在于同一个index的其他的type中的，前面我们了解了type的底层，所以报错了
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Mapper for [title] conflicts with existing mapping in other types:\n[mapper [title] has different [analyzer], mapper [title] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [title] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Mapper for [title] conflicts with existing mapping in other types:\n[mapper [title] has different [analyzer], mapper [title] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [title] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"
  },
  "status": 400
}

我们重新建立一个type
PUT /index0/_mapping/my_type4
{
  "properties": {
    "content11":{
      "type": "text",
      "include_in_all": false
    }
  }
}

执行成功
执行结果：
{
"acknowledged": true
}

修改my_type的字段content12(新字段)，也可以执行成功
PUT /index0/my_type/_mapping
{
  "properties": {
    "content12":{
      "type": "text",
      "include_in_all": false
    }
  }
}

执行结果：
{
"acknowledged": true
}

5.标识性metadata
_index, _type, _id
是document的metadata

定制dynamic mapping

true 遇见陌生字段自动创建

flase遇见陌生字段忽略

strict 遇见陌生字段就报错

date_detection 策略：比如一个field满足yyyy-mm-dd，自动会变成date，如果这时候再有一个document是zhangs，那么就会报错，要防止这种错误，date_detection就要改成false

定制自己的dynamic策略，满足某个通配符，可以自动生成某种类型