Elasticsearch Mapping

最新推荐文章于 2024-08-21 03:02:19 发布

椰子Tyshawn

最新推荐文章于 2024-08-21 03:02:19 发布

阅读量546

点赞数 1

分类专栏： ElasticSearch笔记文章标签： Elasticsearch Mapping elasticsearch mapping

本文链接：https://blog.csdn.net/litianxiang_kaola/article/details/103344432

版权

ElasticSearch笔记专栏收录该内容

19 篇文章 1 订阅

订阅专栏

本文基于Elasticsearch7.x

在这里插入图片描述

什么是Mapping?

定义索引中字段的数据结构, 如字段名, 字段类型, 分词器等等. 类似于MySQL中的表结构.

在Elasticsearch7.0之前一个index可以定义多个type, 在7.0之后就不再手动定义type, 所有index都只有一个默认的type: _doc.

常见的字段类型

核心类型

(1) 字符串

text 和 keyword
需要进行全文搜索的字符串设置为text类型, 需要排序和聚合的字符串设置为keyword类型, keyword类型的字符串不分词.

(2) 数值

byte，short，integer，long
float，double

(3) 日期

date

(4) 布尔值

boolean

复杂类型

(1) object

单个JSON对象.

(2) nested

JSON对象数组.

特殊的类型

(1) join

为同一索引内的文档定义父/子关系.

(2) alias

为现有的字段定义别名.

Mapping 参数

我们来介绍一些常用的Mapping参数.

(1) properties

定义字段及其类型. 类型包括普通类型, 对象类型, 嵌套类型(数组). 实例如下:

a. 定义mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "region": {
        "type": "text"
      },
      "manager": {
        "properties": {
          "age": {
            "type": "integer"
          },
          "name": {
            "type": "text"
          }
        }
      },
      "employees": {
        "type": "nested",
        "properties": {
           "age": {
            "type": "integer"
          },
          "name": {
            "type": "text"
          }
        }
      }
    }
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "region": "shenzhen",
  "manager": {
    "name": "Tom",
    "age": 18
  },
  "employee": [
      {
        "name": "tyshawn",
        "age": "24"
      },
      {
        "name": "lee",
        "age": "34"
      }
    ]
}

(2) analyzer

定义字段使用的分词器. 实例如下:

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

b. 分词

GET /my_index/_analyze
{
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening"
}

结果:

[ 2, run, quick, brown, fox, leap, over, lazi, dog, summer, even ]

(3) enabled

字段是否创建正排索引和倒排索引. 正排索引用于排序和聚合, 倒排索引用于全文搜索. 当设置false时, 字段不能进行全文搜索, 排序和聚合.

enabled字段只能针对Object类型的字段或Mapping全局设置, 实例如下:

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "object",
        "enabled": false
      }
    }
  }
}

或

PUT /my_index
{
  "mappings": {
    "enabled": false
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "title": "Quick brown rabbits"
}

c. 搜索和聚合

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "rabbits"
    }
  }
}

GET /my_index/_search
{
  "size": 0, 
  "aggs": {
    "group_by_title": {
      "terms": {
        "field": "title.keyword"
      }
    }
  }
}

搜索和聚合的结果都是空的.

(4) index

定义字段是否创建倒排索引, 没有创建倒排索引的字段是不能被搜索的, 默认true. 实例如下:

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "index": false
      },
      "title": {
        "type": "text",
        "index": true
      }
    }
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "content": "my first index",
  "title": "index name"
}

c. 搜索

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "index"
    }
  }
}
//一个结果

GET /my_index/_search
{
  "query": {
    "match": {
      "content": "index"
    }
  }
}
//报错

(5) index_options

定义倒排索引的结构. 取值如下:

value	description
docs	只记录 doc id
freqs	记录 doc id 和 Term Frequency
positions	记录 doc id ,Term Frequency 和 Position
offsets	记录 doc id ,Term Frequency ,Position 和 Offset

倒排索引结构介绍: https://blog.csdn.net/litianxiang_kaola/article/details/103059279

我们来看下实例:

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "index_options": "freqs"
      }
    }
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "content": "my first index"
}

c. 进行短语搜索

GET /my_index/_search
{
  "query": {
    "match_phrase": {
      "content": "first index"
    }
  }
}
//报错

(6) norms

当开启norms时会在磁盘中存储各种各样的标准化因子, 用于在全文搜索时计算相关度分数, 但会比较占用磁盘空间. 所以对于一些只会进行过滤和聚合的字段, 可以关闭norms设置. 默认是开启的.

PUT /my_index/
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "norms": false
      }
    }
  }
}

(7) null_value

null值不能创建倒排索引, 也就不能被搜索, 如果我们想搜索null值, 可以通过null_value参数将null替换为一个特殊的值来建立倒排索引. 注意: _source里的数据并没有被替换.

只有keyword类型支持null_value.

实例如下:

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "keyword",
        "null_value": "NULL"
      }
    }
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "content": null
}

c. 搜索

GET /my_index/_search
{
  "query": {
    "match": {
      "content": "NULL"
    }
  }
}

(8) copy_to

将多个字段的值copy到一个"组字段"中, 然后对这个"组字段"建立索引进行搜索. 注意_source里不存在这个"组字段", 只用于搜索.

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "first_name": {
        "type": "text",
        "copy_to": "full_name"
      },
      "last_name": {
        "type": "text",
        "copy_to": "full_name"
      }
    } 
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}

c. 搜索

GET /my_index/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

(9) fields

多字段类型(子字段). 为一个字段定义两种类型, 比如为字符串定义一个"text"类型用于全文搜索, 定义一个keyword类型用于聚合和排序. 实例如下:

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

raw是随便取的一个名称.

b. 添加数据

PUT /my_index/_doc/1
{
  "content": "my first index"
}

PUT /my_index/_doc/2
{
  "content": "my second index"
}

c. 搜索/排序

GET /my_index/_search
{
  "query": {
    "match": {
      "content": "index"
    }
  },
  "sort": [
    {
      "content.raw": {
        "order": "desc"
      }
    }
  ]
}

(10) ignore_above

限制字符串的长度.

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

(11) dynamic

dynamic用来配置动态映射. 即在用户不自定义Mapping的时候, ES会根据写入的doc自动推算字段类型来定义Mapping结构. 下文会详细介绍.

(12) doc_values

倒排索引用于全文搜索, 正排索引用于聚合, 排序. 当doc_values设置为true时(默认为true), ES在建立倒排索引的同时会建立正排索引, 存储在磁盘中. text类型不支持doc_values, 所以text类型不会建立正排索引, ES在默认情况下会为text类型创建一个keyword的多字段类型, 通过keyword类型的字段进行聚合, 排序.

a. 不定义Mapping, 直接添加数据

PUT /my_index/_doc/1
{
  "content": "my first content"
}

b. 查看ES为text类型默认定义的Mapping

GET /my_index/_mapping

结果:

{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

c. 聚合

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "group_by_content": {
      "terms": {
        "field": "content.keyword"
      }
    }
  }
}

结果:

"aggregations" : {
 "group_by_content" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : "my first content",
      "doc_count" : 1
    }
  ]
 }
}

(13) fielddata

text类型的字段通过把fielddata参数设为true, 也可以进行聚合, 排序, 但却不满足实际要求, 原因有两点:

fielddata设为true后, text类型的字段在第一进行聚合, 排序时, ES会为其创建正排索引, 但这个正排索引是在倒排索引的基础上建立的, 这就导致正排索引是基于term创建的, 而不是原始doc.
这个倒排索引是建立在内存中的, 比较占用JVM堆内存, 所以默认disabled.

我们来看实例来理解下:

a. 定义Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "content": "my first index"
}

c. 聚合

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "group_by_content": {
      "terms": {
        "field": "content"
      }
    }
  }
}

结果:

"aggregations" : {
 "group_by_content" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : "first",
      "doc_count" : 1
    },
    {
      "key" : "index",
      "doc_count" : 1
    },
    {
      "key" : "my",
      "doc_count" : 1
    }
  ]
 }
}

由于这个正排索引是基于term建立的, 导致聚合时的buckets有三个, 不满足我们的需求.

(14) eager_global_ordinals

当开启eager_global_ordinals后会在doc_values上再建立一种数据结构, 加速keyword的字段的聚合操作.

PUT /my_index/
{
  "mappings": {
    "properties": {
      "title": {
        "type": "keyword",
        "eager_global_ordinals": true
      }
    }
  }
}

(15) store

Elasticsearch所有的字段值都存储在_source字段中, 单个字段的值是没有额外存储的, 如果要额外存储字段的值, 可以将store字段设置为true.

这个字段开启后有什么用呢? 大部分情况下没必要开启, 如果我们只想搜索某几个字段的值, 而不是整个_source, 就可以开启它, 实例如下:

a. 定义Mapping

PUT /my_index/
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "store": true
      },
      "date": {
        "type": "date",
        "store": true
      },
      "content": {
        "type": "text"
      }
    }
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

c. 搜索

GET /my_index/_search
{
  "stored_fields": ["title", "date"]
}

结果:

"hits" : [
  {
    "_index" : "my_index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0,
    "fields" : {
      "date" : [
        "2015-01-01T00:00:00.000Z"
      ],
      "title" : [
        "Some short title"
      ]
    }
  }
]

这个功能很鸡肋, 因为它完全可以被_source filtering代替. ElasticSearch 全文搜索之基础语法API里有_source filtering的介绍.

动态映射(Dynamic Mapping)

(1) 什么是动态映射?

在用户没有自定义Mapping的时候, ES会根据写入的doc自动推算出字段类型来定义Mapping结构.

(2) 类型的自动识别

JSON类型	ES类型
字符串	匹配日期格式, 设置成date. 可以设置date_detection为false关闭. 匹配数字格式, 设置成float或long, 默认关闭, 可以设置numeric_detection为true打开. 其他设置为text, 并增加keyword子字段
布尔值	boolean
浮点数	float
整数	long
对象	object
数组	由第一个非空数值的类型决定
空值	忽略

(3) 动态映射的配置

dynamic	描述
true	新检测到的字段会被添加到Mapping中. 默认为true.
false	新检测到的字段不会被添加到Mapping中, 即这个字段不会建立索引, 不能被全文搜索. 但会作为其他字段的搜索结果出现在_source中
strict	如果检测到新字段, 会直接报错

注意字段类型一旦定义就不能再修改, 如果想修改字段类型, 必须执行Reindex API来重建索引.

(4) 实例

a. 不定义Mapping, 直接写入doc

PUT /my_index/_doc/1
{
  "content": "my first index"
}

b. 查看Mapping

GET /my_index/_mapping

结果:

{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

c. 再写入一个包含新字段的doc

PUT /my_index/_doc/1
{
  "content": "my second index",
  "date": "2019-01-01"
}

d. 查看Mapping

GET /my_index/_mapping

结果:

{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "date" : {
          "type" : "date"
        }
      }
    }
  }
}

动态模板(Dynamic Templates)

ES动态映射的Mapping规则是可以自定义的, 我们称其为动态模板.

语法介绍

配置动态模板的语法:

"dynamic_templates": [
	{
	  "my_template_name": { 
	    ...  match conditions ... 
	    "mapping": { ... } 
	  }
	},
	...
]

(1) my_template_name

模板名, 可以任取.

(2) match conditions

匹配条件, 用于匹配要被修改的字段名或字段类型, 具体有:

match_mapping_type
明确的目标类型, 如long
match
匹配目标字段名, 如long_*
unmatch
不匹配目标字段名.
match_pattern
正则表达式匹配.
path_match
和match类似, 但path_match用于匹配对象的属性, 有层级关系, 如some_object.*.some_field
path_unmatch

(3) mapping

被匹配上的字段应该使用的Mapping结构.

实例

(1) 动态映射时, 用integer代替long, string设置为text类型和keyword多字段类型.

a. 定义动态模板

PUT /my_index
{
  "mappings": {
    "dynamic_templates": [
        {
          "long_to_integer": {
            "match_mapping_type": "long",
            "mapping": {
              "type": "integer"
            }
          }
        },
        {
          "string_as_text": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "fields": {
                "raw": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ]
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "content": "my first index",
  "count": 3
}

c. 查看Mapping结构

"properties" : {
	"content" : {
	  "type" : "text",
	  "fields" : {
	    "raw" : {
	      "type" : "keyword",
	      "ignore_above" : 256
	    }
	  }
	},
	"count" : {
	  "type" : "integer"
	}
}

(2) 动态映射时, 将所有字符串中以"long_“开头的字段名的类型设置为long, 其中除开以”_text"结尾的字段名.

a. 定义动态模板

PUT /my_index
{
  "mappings": {
    "dynamic_templates": [
        {
          "long_as_text": {
            "match_mapping_type": "string",
            "match": "long_*",
            "unmatch": "*_text",
            "mapping": {
              "type": "long"
            }
          }
        }
      ]
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "long_num": 5,
  "long_text": "foo"
}

c. 查看Mapping结构

"properties" : {
	"long_num" : {
	  "type" : "long"
	},
	"long_text" : {
	  "type" : "text",
	  "fields" : {
	    "keyword" : {
	      "type" : "keyword",
	      "ignore_above" : 256
	    }
	  }
	}
}

(3) 动态映射时, 在字段名为"name"的对象中, 将其属性字段值copy到"full_name"字段中, 除字段名为"middle"的之外.

a. 定义动态模板

PUT /my_index
{
  "mappings": {
    "dynamic_templates": [
        {
          "copy_to_full_name": {
            "path_match": "name.*",
            "path_unmatch": "name.middle",
            "mapping": {
              "type": "text",
              "copy_to": "full_name"
            }
          }
        }
      ]
  }
}

b. 添加数据

PUT /my_index/_doc/1
{
  "name": {
    "first": "John",
    "middle": "Winston",
    "last": "Lennon"
  }
}

c. 查看Mapping结构

"properties" : {
 "full_name" : {
  "type" : "text",
  "fields" : {
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
},
"name" : {
  "properties" : {
    "first" : {
      "type" : "text",
      "copy_to" : [
        "full_name"
      ]
    },
    "last" : {
      "type" : "text",
      "copy_to" : [
        "full_name"
      ]
    },
    "middle" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    }
  }
}
}

d. 搜索full_name

GET /my_index/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "John Lennon",
        "operator": "and"
      }
    }
  }
}

索引模板(Index Templates)

索引模板中定义了Setting, Mapping, Aliases等配置, 我们在创建索引时, 如果匹配上对应的索引模板, 就会使用对应模板的配置.

手动定义的Setting, Mapping, Aliases等配置会覆盖索引模板的配置.

语法

PUT /_template/模板名
{
	"index_patterns": "",
	"aliases": "",
	"settings": "",
	"mappings": "",
	"version": ,
	"order": 
}

常用参数介绍

(1) index_patterns

通配符表达式数组, 用于匹配索引名.

(2) aliases

索引别名.

(3) settings

索引的配置.

(4) mappings

索引的字段映射.

(5) version

用来管理索引模板的版本号.

(6) order

模板的应用顺序, 大序号覆盖小序号.

索引模板的工作方式(后面的会覆盖前面的): 默认配置, order数值小的索引模板, order数值大的索引模板, 用户手动定义的配置.

实例

(1) 不定义索引模板, 也不手动定义索引配置, 使用ES默认的配置创建索引

a. 添加数据

PUT /my_index/_doc/1
{
  "somenumber": "1",
  "somedate": "2020-01-01"
}

b. 查看Settings

GET /my_index/_settings

结果:

{
  "my_index" : {
    "settings" : {
      "index" : {
        "creation_date" : "1574778715305",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "ez5KdYEDRw-Rvg0EP0Y9ng",
        "version" : {
          "created" : "7040299"
        },
        "provided_name" : "my_index"
      }
    }
  }
}

c. 查看Mapping

GET /my_index/_mapping

结果:

{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "somedate" : {
          "type" : "date"
        },
        "somenumber" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

这里可以看到, 日期类型的字符串被自动识别为date类型, 整数类型的字符串被自动识别为text类型.

(2) 创建两个索引模板

a. 模板一

PUT _template/template_default
{
  "index_patterns": ["*"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 2
  },
  "version": 1,
  "order": 0
}

b. 模板二

PUT _template/template_test
{
  "index_patterns": ["test*"],
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 3
  },
  "mappings": {
    "date_detection": false,
    "numeric_detection": true
  },
  "version": 1,
  "order": 1
}

关闭了日期检查, 打开了数值检查. 这两个配置的具体含义见上文: “动态映射-类型的自动识别”.

(3) 使用索引模板创建新的索引

a. 匹配模板一

PUT /her_index/_doc/1
{
  "somenumber": "2",
  "somedate": "2021-01-01"
}

b. 查看settings

GET /her_index/_settings

结果:

{
  "her_index" : {
    "settings" : {
      "index" : {
        "creation_date" : "1574779910634",
        "number_of_shards" : "1",
        "number_of_replicas" : "2",
        "uuid" : "npnRfztwQDC5mAnWrYxijA",
        "version" : {
          "created" : "7040299"
        },
        "provided_name" : "her_index"
      }
    }
  }
}

c. 匹配模板二

PUT /test_index/_doc/1
{
  "somenumber": "3",
  "somedate": "2022-01-01"
}

d. 查看settings

GET /test_index/_settings

结果:

{
  "test_index" : {
    "settings" : {
      "index" : {
        "creation_date" : "1574779966828",
        "number_of_shards" : "2",
        "number_of_replicas" : "3",
        "uuid" : "EfTzD_cPTS6AVxK1a_kjVg",
        "version" : {
          "created" : "7040299"
        },
        "provided_name" : "test_index"
      }
    }
  }
}

e. 查看mappings

GET /test_index/_mapping

结果:

{
  "test_index" : {
    "mappings" : {
      "date_detection" : false,
      "numeric_detection" : true,
      "properties" : {
        "somedate" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "somenumber" : {
          "type" : "long"
        }
      }
    }
  }
}