文档映射Mapping
Mapping类似数据库中的schema的定义,作用如下:
- 定义索引中的字段的名称
- 定义字段的数据类型,例如字符串,数字,布尔等
- 字段,倒排索引的相关配置(Analyzer)
ES中Mapping映射可以分为动态映射和静态映射
动态映射: 在文档写入Elasticsearch时,会根据文档字段自动识别类型
静态映射: 在Elasticsearch中事先定义好映射,包含文档的各字段类型、分词器等
动态映射当类型如果设置不对时,会导致一些功能无法正常运行,例如Range查询
- analyzer
ik_max_word
- dynamic
true|false|strict
- _reindex
POST source|dest
- _alias
PUT /user2/_alias/user
- index
"index": false
- index_options
docs|freqs|positions|offsets
- null_value
"null_value": "NULL"
- copy_to
"copy_to": "full_address"
- _bulk
PUT /address/_bulk
- index Template
PUT /_template/template_test
- Dynamic Template
dynamic_templates
- path_match/path_unmatch
dynamic_templates.path_match
PUT /user
{
"mappings": {
"dynamic": "true/false/strict", //default true 一旦有新增字段的文档写入,Mapping 也同时被更新; false: Mapping 不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中; strict(严格控制策略),文档写入失败,抛出异常
"properties": {
"properties" : {
"province" : {
"type" : "keyword",
"copy_to": "full_address" //将字段的数值拷贝到目标字段,满足一些特定的搜索需求。copy_to的目标字段不出现在_source中
},
"city" : {
"type" : "text",
"copy_to": "full_address",
"analyzer": "ik_max_word" //IK 分词
},
"name" : {
"type" : "keyword",
"null_value": "NULL" //需要对Null值进行搜索,只有keyword类型支持设计Null_Value
},
"address": {
"type": "object",
"dynamic": "true",
"index": false, //index: 控制当前字段是否被索引,默认为true。如果设置为false,该字段不可被搜索
"index_options": "offsets", //text类型默认记录postions,其他默认为 docs; docs 记录doc id; freqs:+term frequencies(词频); positions + term position; offsets + character offsets
}
}
},
"settings" : {
"index" : {
"analysis.analyzer.default.type": "ik_max_word"
}
}
}
POST _reindex
{
"source": {
"index": "user"
},
"dest": {
"index": "user2"
}
}
PUT /user2/_alias/user
思考:能否后期更改Mapping的字段类型?
-
新增加字段
dynamic设为true时,一旦有新增字段的文档写入,Mapping 也同时被更新
dynamic设为false,Mapping 不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中
dynamic设置成strict(严格控制策略),文档写入失败,抛出异常 -
对已有字段,一旦已经有数据写入,就不再支持修改字段定义
Lucene 实现的倒排索引,一旦生成后,就不允许修改
如果希望改变字段类型,可以利用 reindex API,重建索引
具体方法:- 1)如果要推倒现有的映射, 你得重新建立一个静态索引
- 2)然后把之前索引里的数据导入到新的索引里
- 3)删除原创建的索引
- 4)为新索引起个别名, 为原索引名
原因:
如果修改了字段的数据类型,会导致已被索引的数据无法被搜索
但是如果是增加新的字段,就不会有这样的影响
新增加字段 静态映射 demo dynamic: true
- 静态映射 设置dynamic=“strict”
PUT /user
{
"mappings": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
"address": {
"type": "object",
"dynamic": "true"
}
}
}
}
- 新增字段Age
params
PUT /user/_doc/1
{
"name":"fox",
"age":32,
"address":{
"province":"湖南",
"city":"长沙"
}
}
response
{
"error" : {
"root_cause" : [
{
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed"
}
],
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed"
},
"status" : 400
}
- 修改daynamic=true
PUT /user/_mapping
{
"dynamic":true
}
对已有字段 修改 demo
具体方法:
- 1)如果要推倒现有的映射, 你得重新建立一个静态索引
- 2)然后把之前索引里的数据导入到新的索引里
- 3)删除原创建的索引
- 4)为新索引起个别名, 为原索引名
PUT /user
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"address": {
"type": "text"
}
}
}
}
PUT /user/_doc/1
{
"name":"fox",
"age":32,
"address": "测试地址"
}
get /user/_search
{
"query": {
"term": {
"address": "测试"
}
}
}
response
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
- 新建立一个静态索引
PUT /user2
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"address": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
- 把之前索引里的数据导入到新的索引里
POST _reindex
{
"source": {
"index": "user"
},
"dest": {
"index": "user2"
}
}
- 查询新索引是可以查到的
get /user2/_search
{
"query": {
"term": {
"address": "测试"
}
}
}
{
"took" : 694,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "user2",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "fox",
"age" : 32,
"address" : "测试地址"
}
}
]
}
}
- 删除原创建的索引
DELETE /user
- 为新索引起个别名, 为原索引名
PUT /user2/_alias/user
GET /user
get /user/_search
{
"query": {
"term": {
"address": "测试"
}
}
}
注意: 通过这几个步骤就实现了索引的平滑过渡,并且是零停机
常用Mapping参数配置
1. index: 控制当前字段是否被索引,默认为true。如果设置为false,该字段不可被搜索
DELETE /user
PUT /user
{
"mappings" : {
"properties" : {
"address" : {
"type" : "text",
"index": false
},
"age" : {
"type" : "long"
},
"name" : {
"type" : "text"
}
}
}
}
PUT /user/_doc/1
{
"name":"fox",
"address":"广州白云山公园",
"age":30
}
GET /user
GET /user/_search
{
"query": {
"match": {
"address": "广州"
}
}
}
response
{
"error" : {
"root_cause" : [
{
"type" : "query_shard_exception",
"reason" : "failed to create query: Cannot search on field [address] since it is not indexed.",
"index_uuid" : "AlWZrE-XT4iwIJsd8V9IfQ",
"index" : "user"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "user",
"node" : "rEYg9XpfS_uCtGpHpeoSCw",
"reason" : {
"type" : "query_shard_exception",
"reason" : "failed to create query: Cannot search on field [address] since it is not indexed.",
"index_uuid" : "AlWZrE-XT4iwIJsd8V9IfQ",
"index" : "user",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Cannot search on field [address] since it is not indexed."
}
}
}
]
},
"status" : 400
}
2.有四种不同基本的index options配置,控制倒排索引记录的内容
- docs : 记录doc id
- freqs:记录doc id 和term frequencies(词频)
- positions: 记录doc id / term frequencies / term position
- offsets: doc id / term frequencies / term posistion / character offsets
text类型默认记录postions,其他默认为 docs。记录内容越多,占用存储空间越大
DELETE /user
PUT /user
{
"mappings" : {
"properties" : {
"address" : {
"type" : "text",
"index_options": "offsets"
},
"age" : {
"type" : "long"
},
"name" : {
"type" : "text"
}
}
}
}
3.null_value: 需要对Null值进行搜索,只有keyword类型支持设计Null_Value
DELETE /user
PUT /user
{
"mappings" : {
"properties" : {
"address" : {
"type" : "keyword",
"null_value": "NULL"
},
"age" : {
"type" : "long"
},
"name" : {
"type" : "text"
}
}
}
}
PUT /user/_doc/1
{
"name":"fox",
"age":32,
"address":null
}
GET /user/_search
{
"query": {
"match": {
"address": "NULL"
}
}
}
4.copy_to设置:将字段的数值拷贝到目标字段,满足一些特定的搜索需求。copy_to的目标字段不出现在_source中
# 设置copy_to
DELETE /address
PUT /address
{
"mappings" : {
"properties" : {
"province" : {
"type" : "keyword",
"copy_to": "full_address"
},
"city" : {
"type" : "text",
"copy_to": "full_address"
}
}
},
"settings" : {
"index" : {
"analysis.analyzer.default.type": "ik_max_word"
}
}
}
PUT /address/_bulk
{ "index": { "_id": "1"} }
{"province": "湖南","city": "长沙"}
{ "index": { "_id": "2"} }
{"province": "湖南","city": "常德"}
{ "index": { "_id": "3"} }
{"province": "广东","city": "广州"}
{ "index": { "_id": "4"} }
{"province": "湖南","city": "邵阳"}
GET /address/_search
{
"query": {
"match": {
"full_address": {
"query": "湖南常德",
"operator": "and"
}
}
}
}
5.Index Template
Index Templates可以帮助你设定Mappings和Settings,并按照一定的规则,自动匹配到新创建的索引之上
- 模版仅在一个索引被新创建时,才会产生作用。修改模版不会影响已创建的索引
- 你可以设定多个索引模版,这些设置会被“merge”在一起
- 你可以指定“order”的数值,控制“merging”的过程
PUT /_template/template_default
{
"index_patterns": ["*"],
"order": 0,
"version": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
PUT /_template/template_test
{
"index_patterns": ["test*"],
"order": 1,
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"date_detection": false,
"numeric_detection": true
}
}
lndex Template的工作方式
当一个索引被新创建时:
- 应用Elasticsearch 默认的settings 和mappings
- 应用order数值低的lndex Template 中的设定
- 应用order高的 Index Template 中的设定,之前的设定会被覆盖
- 应用创建索引时,用户所指定的Settings和 Mappings,并覆盖之前模版中的设定
#查看template信息
GET /_template/template_default
GET /_template/temp*
PUT /testtemplate/_doc/1
{
"orderNo": 1,
"createDate": "2022/01/01"
}
GET /testtemplate/_mapping
GET /testtemplate/_settings
PUT /testmy
{
"mappings": {
"date_detection": true
}
}
PUT /testmy/_doc/1
{
"orderNo": 1,
"createDate": "2022/01/01"
}
GET /testmy/_mapping
6.Dynamic Template
Dynamic Tempate定义在某个索引的Mapping中
#Dynaminc Mapping 根据类型和字段名
DELETE my_index
PUT my_index/_doc/1
{
"firstName":"Ruan",
"isVIP":"true"
}
GET my_index/_mapping
DELETE my_index
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"strings_as_boolean": {
"match_mapping_type": "string",
"match":"is*",
"mapping": {
"type": "boolean"
}
}
},
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
7. 结合路径
PUT /my_test_index
{
"mappings": {
"dynamic_templates": [
{
"full_name":{
"path_match": "name.*",
"path_unmatch": "*.middle",
"mapping":{
"type": "text",
"copy_to": "full_name"
}
}
}
]
}
}
PUT /my_test_index/_doc/1
{
"name":{
"first": "John",
"middle": "Winston",
"last": "Lennon"
}
}
GET /my_test_index/_search
{
"query": {
"match": {
"full_name": "John"
}
}
}