Elasticsearch-Mapping以及Template

打分几楼MOTO

已于 2022-04-02 17:56:15 修改

阅读量1.4k

点赞数 1

分类专栏：数据库(Elasticsearch) 文章标签： elasticsearch

于 2022-04-02 17:56:09 首次发布

本文链接：https://blog.csdn.net/qq_44447372/article/details/123878253

版权

数据库(Elasticsearch) 专栏收录该内容

39 篇文章 4 订阅

订阅专栏

Exact Values vs Full Text

Mapping

Mapping类似数据库中的schema的定义,作用如下
- 定义索引中的字段的名称
- 定义字段的数据类型,例如字符串,数字, 布尔......
- 字段(ES会对每一个字段做倒排索引建立),可以对字段做倒排索引的相关配置(Analyzed or Not Analyzed, Analyzer)
Mapping会把JSON文档映射成Lucene所需要的扁平格式
一个Mapping属于一个索引的Type
- 每个文档属于一个Type
- 一个Type有一个Mapping定义
- 7.0开始,不需要在Mapping中定义指定Type信息

字段的数据类型

简单类型

Text / Keyword
Date
integer /Floating
Boolean
IPv4 && IPv6

复杂类型

对象类型 /嵌套类型

特殊类型

geo_point & geo_shape / percolator

Dynamic Mapping

在写入文档时,如果索引不存在,会自动创建索引
Dynamic Mapping的机制,使得我们无需手动定义Mappings.Elasticsearch会自动根据文档信息,推算出字段的类型
但是有时候会推算的不对,例如将地理位置字段推断成text
当类型不正确时,会导致一些功能无法正常运行;例如Range查询

类型的自动识别

JSON类型	Elasticsearch类型
字符串	匹配日期格式,设置成Date; 配置数字设置为float或者long,该选项默认关闭; 设置为Text,并且增加keyword子字段
布尔值	Boolean
浮点数	float
整数	long
对象	Object
数组	由第一个非空值的类型所决定
空值	忽略

测试

//第一次测试

PUT /mapping_test/_doc/1
{
"firstName":"Chan",
"lastName":"JackIE",
"loginDate":"2018-07-12"
}

//带了双引号的字符串,符合日期的话type为data,其余type为text

GET /mapping_test/_mapping

{
"mapping_test" : {
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"loginDate" : {
"type" : "date"
}
}
}
}
}

//第二次测试
PUT /mapping_test/_doc/1
{
"uid":"123",
"isVip":false,
"isAdmin":"true",
"age":19,
"heigh":180
}

//isVip的type为boolean;uid为text;isAdmin为text,age和heigh为long

GET /mapping_test/_mapping

{
"mapping_test" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"heigh" : {
"type" : "long"
},
"isAdmin" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"isVip" : {
"type" : "boolean"
},
"uid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

能否更改Mapping的字段类型

两种情况
- 新增加字段
  - Dynamic设为true时,一旦有新增字段的文档写入,Mapping也会被更新
  - Dynamic设为false,Mapping不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中
  - Dynamic设为strict,文档写入失败
- 对已有字段,一旦已经有数据写入,就不再支持修改字段定义
  - Lucene实现的倒排索引,一旦生成后,就不允许修改
- 如果希望改变字段类型,必须ReindexAPI,重建索引
原因
- 如果修改了字段的数据类型,会导致已被索引的属于无法被搜索
- 但是如果是增加新的字段,就不会有这样的影响

控制Dynamic Mappings

	true	false	strict
文档可索引	YES	YES	NO
字段可索引	YES	NO	NO
Mapping被更新	YES	NO	NO

当dynammic被设置成false,存在新增字段的数据写入,该数据可以被索引,但是新增字段被丢弃
但设置成Strict模式的时候数据写入直接出错
测试

PUT /dynamic_mapping_test/_doc/1
{
"newField":"someValue"
}

//搜索该字段,可以搜索到

POST /dynamic_mapping_test/_search
{
"query": {
"match": {
"newField": "someValue"
}
}
}

//修改为dynamic false
PUT /dynamic_mapping_test/_mapping
{
"dynamic":"false"
}

//新增其他字段,可以添加成功

PUT /dynamic_mapping_test/_doc/1
{
"otherField":"someValue"
}

//利用otherField字段搜索,找不到对应的数据

POST /dynamic_mapping_test/_search
{
"query": {
"match": {
"otherField": "someValue"
}
}
}

//该索引的mapping文件没有otherField字段

GET /dynamic_mapping_test/_mapping

{
"dynamic_mapping_test" : {
"mappings" : {
"dynamic" : "false",
"properties" : {
"newField" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

//在这里要提示一点:PUT /dynamic_mapping_test/_doc/1 会进行全量替换如果需要增加字段的话需要将之前的字段也带上.否则,就只有新增字段

//修改为dynamic strict
PUT /dynamic_mapping_test/_mapping
{
"dynamic":"strict"
}

//新增加字段,会报错:strict_dynamic_mapping_exception

PUT /dynamic_mapping_test/_doc/1
{
"otherField":"someValue"
}

显示Mapping设置与常见参数介绍

如何显示定义一个Mapping

PUT /index

{

"mappings":{

//define your mappings here

}

}

自定义Mapping的一些建议

可以参考API手册,纯手写
为了减少输入的工作量,减少出错概率,可以依照以下步骤
- 创建一个临时的index,写入一些样本数据
- 通过访问Mapping API 获得该临时文件的动态Mapping定义
- 修改后用,,使用该配置创建你的索引
- 删除临时索引

控制当前字段是否被索引

index ----控制当前字段是否被索引.默认为true.如果设置成false,该字段不可被搜索
- 保护数据
- 节省磁盘的存储空间(倒排索引不会被创建)

//创建user索引,并且显示的Mapping,将字段mobile的index设为false

PUT /user
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "text",
"index": false
}
}
}
}

//添加数据

PUT /user/_doc/1
{
"firstName":"111",
"lastName":"222",
"mobile":"2222"
}

//根据mobile字段搜索

POST /user/_search
{
"query": {
"match": {
"mobile": "2222"
}
}
}

搜索结果: "failed to create query: Cannot search on field [mobile] since it is not indexed."

Index Options
- 四种不同级别的index options配置,可以控制倒排索引记录的内容
  - docs --记录doc id
  - freqs --记录doc id 和term frequencies
  - positions --记录doc id / term frequencies /term position
  - offsets --doc id / term frequencies /term position /character offsets
- Text类型默认记录positions ,其他默认docs
- 记录内容越多,占用存储空间越大

PUT /user
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text",
"index_options": "positions"
},
"mobile":{
"type": "text",
"index": false
}

}
}
}

null_value
- 需要对Null值实现搜索
- 只有keyword类型支持设定Null_Value

PUT /person
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "text",
"null_value": "Null"
}
}
}
}

报错信息:"Mapping definition for [mobile] has unsupported parameters: [null_value : Null]"

PUT /person
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "keyword",
"null_value": "NULL"
}
}
}
}

//增加响应的数据

PUT /person/_doc/1
{
"firstName":"nie",
"lastName":"peng",
"mobile":null
}

//查询mobile,可以得到对应的数据

POST /person/_search
{
"query":{
"match": {
"mobile": "NULL"
}
}
}

数组类型
- ES中不提供专门的数组类型.但是任何字段,都可以包含在多个相同类类型的数组

PUT /person/_doc/2
{
"firstName":"nie",
"lastName":"peng",
"mobile":["111","222","333"]
}

//类型并没有发生变化

GET /person/_mapping

{
"person" : {
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text"
},
"lastName" : {
"type" : "text"
},
"mobile" : {
"type" : "keyword",
"null_value" : "NULL"
}
}
}
}
}

copy_to设置
- _all在7中被copy_to所替代
- 满足一些特性的搜索需求
- copy_to将字段的数值拷贝到目标字段,实现类似_all的作用
- copy_to的目标字段不出现在_source中

PUT users
{
"mappings": {
"properties": {
"firstName":{
"type": "text",
"copy_to": "fullName"
},
"lastName":{
"type": "text",
"copy_to": "fullName"
}
}
}
}

PUT users/_doc/1
{
"firstName":"Ruan",
"lastName":"Yiming"
}

GET /users/_search?q=fullName:(Ruan Yiming)

POST users/_search
{
"query": {
"match": {
"fullName": {
"query": "Ruan Yiming",
"operator": "and"
}
}
}
}

多字段特性

手动设置mapping的时候,可以为一个字段增加一个子字段,或者指定分词器

Exact Values vs Full Text

Exact Value:包括数字/日期/具体的一个字符串(Apple Store)

ElasticSearch中的keyword
Exact Values不需要被分词,为每一个字段创建一个倒排索引
- Exact Value在索引时,不需要做特殊的分词处理

全文本,非结构化的文本数据
- ElasticSearch中的text

自定义分词

当ElasticSearch自带的分词器无法满足时,可以自定义分词器,通过自组合不同的组件实现
- Character Filter
  - 在Tokenizer之前对文本进行处理,例如增加删除及替换字符,可以配置多个Character.会影响Tokenizer的position和offset信息
  - 一些自带的Character Filters
    - HTML strip -去除html标签
    - Mapping -字符串替换
    - Pattern replace --正则匹配替换
- Tokenizer
  - 将原始的文本按照一定的规则,切分为词(term or token)
  - ElasticSearch内置的Tokenizers
    - whiterspace(根据空格进行切分)/ standard/ uax_urlemail/ pattern/ keyword(不会做分词处理)/ path hierarchy(按照文件的路径进行切分)
  - 可以用java开发插件,实现自己的Tokenizer
- Token Filter
  - 将Tokenizer输出的单词(term),进行增加,修改,删除
  - 自带的Token Filters
    - Lowercase / stop(去掉一些辅助词) /synonym(添加近义词)

Character Filter定义为HTML strip,Tokenizer定义为keyword

POST _analyze
{
"tokenizer": "keyword",
"char_filter": ["html_strip"],
"text":"<b>hello word</b>"
}

//结果

{
"tokens" : [
{
"token" : "hello word",
"start_offset" : 3,
"end_offset" : 17,
"type" : "word",
"position" : 0
}
]
}

//Character Filter定义为mapping,Tokenizer定义为standard

POST _analyze
{
"tokenizer": "standard",
"char_filter": [{
"type":"mapping",
"mappings": ["- => _"]
}],
"text":"123-456,I-test? test-990 650-555-1234"
}

//结果

{
"tokens" : [
{
"token" : "123_456",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<NUM>",
"position" : 0
},
{
"token" : "I_test",
"start_offset" : 8,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "test_990",
"start_offset" : 16,
"end_offset" : 24,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "650_555_1234",
"start_offset" : 25,
"end_offset" : 37,
"type" : "<NUM>",
"position" : 3
}
]
}

Character Filter定义为mapping,Tokenizer定义为standard

POST _analyze
{
"tokenizer": "standard",
"char_filter": [{
"type":"mapping",
"mappings": [":) => happy",":( => sad"]
}],
"text":["I an felling :)","Feeling :( today "]
}

//结果

{
"tokens" : [
{
"token" : "I",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "an",
"start_offset" : 2,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "felling",
"start_offset" : 5,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "happy",
"start_offset" : 13,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "Feeling",
"start_offset" : 16,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 104
},
{
"token" : "sad",
"start_offset" : 24,
"end_offset" : 26,
"type" : "<ALPHANUM>",
"position" : 105
},
{
"token" : "today",
"start_offset" : 27,
"end_offset" : 32,
"type" : "<ALPHANUM>",
"position" : 106
}
]
}

//正则表达式

GET _analyze
{
"tokenizer": "standard",
"char_filter": [{
"type":"pattern_replace",
"pattern": "http://(.*)",
"replacement":"$1"
}],
"text":"http://www.elastic.co"
}

//结果

{
"tokens" : [
{
"token" : "www.elastic.co",
"start_offset" : 0,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}

//按照路径切分

GET _analyze
{
"tokenizer": "path_hierarchy",
"text":"/user/a/b/c/d"
}

//结果

{
"tokens" : [
{
"token" : "/user",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a/b",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a/b/c",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a/b/c/d",
"start_offset" : 0,
"end_offset" : 13,
"type" : "word",
"position" : 0
}
]
}

//whiteSpace 与stop
GET _analyze
{
"tokenizer": "whiteSpace",
"filter": ["stop"],
"text":"The rain in Spain falls mainly on the plain."
}

//结果

{
"tokens" : [
{
"token" : "The",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "rain",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "Spain",
"start_offset" : 12,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "falls",
"start_offset" : 18,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "mainly",
"start_offset" : 24,
"end_offset" : 30,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "plain",
"start_offset" : 38,
"end_offset" : 43,
"type" : "<ALPHANUM>",
"position" : 8
}
]
}

//whiteSpace 与stop:加上lowercase后,The被当成stopword删除;filter里面的顺序也很重要,先是停用还是小写或者先是小写还是停用
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["lowercase","stop"],
"text":"The rain in Spain falls mainly on the plain."
}

{
"tokens" : [
{
"token" : "rain",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 1
},
{
"token" : "spain",
"start_offset" : 12,
"end_offset" : 17,
"type" : "word",
"position" : 3
},
{
"token" : "falls",
"start_offset" : 18,
"end_offset" : 23,
"type" : "word",
"position" : 4
},
{
"token" : "mainly",
"start_offset" : 24,
"end_offset" : 30,
"type" : "word",
"position" : 5
},
{
"token" : "plain.",
"start_offset" : 38,
"end_offset" : 44,
"type" : "word",
"position" : 8
}
]
}

为某一个索引自定义一个分词器

PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
//自定义分词器的名字
"my_custom_analyzer":{
//指定类型为自定义
"type":"custom",
//自定义的char_filter名字为emoticons
"char_filter":["emoticons"],
//自定义的tokenizer名字为punctuuation
"tokenizer":"punctuuation",
//自定义的filter名字为english_stop
"filter":["lowercase","english_stop"]
}
},
//在这里创建我们的punctuuation
"tokenizer": {
"punctuuation":{
"type":"pattern",
"pattern":"[ .,!?]"
}
},
//在这里创建我们的enoticons
"char_filter": {
"emoticons":{
"type":"mapping",
"mappings":[
":) => happy",":( => sad"
]
}
},
//在这里创建我们的english_stop
"filter": {
"english_stop":{
"type":"stop",
"stopwords":"_english_"
}
}
}
}
}

POST my_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text":"I'm a :) person ,and you??"
}

{
"tokens" : [
{
"token" : "i'm",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "happy",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
},
{
"token" : "person",
"start_offset" : 9,
"end_offset" : 15,
"type" : "word",
"position" : 3
},
{
"token" : "you",
"start_offset" : 21,
"end_offset" : 24,
"type" : "word",
"position" : 5
}
]
}

Index Template

Index Template -帮助你设定 Mappings和Settings,并按照一定的规则,自动匹配到新创建的索引之上
- 模板仅仅在一个索引被新创建时,才会产生作用.修改模板不会影响已创建的索引
- 你可以设定多个索引模板,这些设置都会被"merge"在一起
- 你可以指定"order"的数值,控制"merging"的过程

Index Template的工作方式

当一个索引被新创建时
- 引用ES默认的settings和mappings
- 应用order数值低的IndexTemplate中的设定
- 应用order数值高的IndexTemplate中的设定,之前的设定会被覆盖
- 引用创建覆盖时,用户所指定的Setings和Mappings,并覆盖之前模板中的设定

//自定义一个索引模板template_defalut

PUT _template/template_defalut
{
//所有的索引在创建的时候都会把按照下面的定义走:将主分片和副本分片都是1

"index_patterns": ["*"],
"order": 0,
"version": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}

//自定义一个索引模板template_test

PUT _template/template_test
{
//test开头的索引在创建的时候都会把按照下面的定义走:将主分片设为1,副本分片设为2,符合日期格式的字符串不自动转换,字符串是一个数字的映射成long
"index_patterns": ["test*"],
"order": 1,
"version": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 2
},
"mappings": {
"date_detection":false,
"numeric_detection": true
}
}

//查看template信息
GET /_template/template_defalut
//通过通配符查看所有的template
GET /_template/temp*

//写入新数据
PUT testtemplate/_doc/1
{
"someNumber":"1",
"someDate":"2019/01/01"
}

//查看mapping
GET testtemplate/_mapping

//按照Index Template的工作方式

{
"testtemplate" : {
"mappings" : {
"date_detection" : false,
"numeric_detection" : true,
"properties" : {
"someDate" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"someNumber" : {
"type" : "long"
}
}
}
}
}

//按照Index Template的工作方式
//查看settings
GET testtemplate/_settings

{
"testtemplate" : {
"settings" : {
"index" : {
"creation_date" : "1648890629178",
"number_of_shards" : "1",
"number_of_replicas" : "2",
"uuid" : "qAzSpbNkSsuTwgFdbGC5NQ",
"version" : {
"created" : "7080099"
},
"provided_name" : "testtemplate"
}
}
}
}

//创建索引的时候执行settings
PUT testmy
{
"settings":{
"number_of_replicas": 5
}
}

//增加对应的数据

PUT testmy/_doc/1
{
"key":"value"
}

//查看该索引下的settings

GET testmy/_settings

{
"testmy" : {
"settings" : {
"index" : {
"creation_date" : "1648890784839",
"number_of_shards" : "1",
"number_of_replicas" : "5",
"uuid" : "Jn0moCWYRZWBKwWC_brT2A",
"version" : {
"created" : "7080099"
},
"provided_name" : "testmy"
}
}
}
}

//查看该索引下的mapping

GET testmy/_mapping

{
"testmy" : {
"mappings" : {
"date_detection" : false,
"numeric_detection" : true,
"properties" : {
"key" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

Dynamic Template

前面的index template是应用在所有的index上面的;dynamic Template是应用在一个具体的索引上面的

根据ElasticSearch识别的数据类型,结合字段名称,来动态的设定字段类型
- 所有的字符串类型都设定成KeyWord,或者关闭keyword字段
- is开头的字段设置成Boolean
- long_开头的都设成long类型

Dynamic Template是定义在某个索引的mapping中
Template有一个名称
匹配规则是一个数组
为匹配的到的字段设置Mapping

PUT my_index
{
"mappings": {

//将name下面除了midde的字段外,其他字段都设为text,值拷贝到full_name中
"dynamic_templates":[
{
"full_name":{
"path_match":"name.*",
"path_unmatch":"*.middle",
"mapping":{
"type":"text",
"copy_to": "full_name"
}
}
}
]
}
}

//增加数据
PUT my_index/_doc/1
{
"name":{
"first":"John",
"middle":"Winston",
"last":"Lennon"
}
}

//通过full_name检索

GET my_index/_search?q=full_name:John

PUT my_index
{
"mappings": {
"dynamic_templates":[
{
"string_as_boolean":{
//possible values are [object, string, long, double, boolean, date, binary]
"match_mapping_type":"string",
"match":"is*",
"mapping":{
"type":"boolean"
}
}
},
{
"string_as_keywords":{
"match_mapping_type":"string",
"mapping":{
"type":"keyword"
}
}
}
]
}
}

PUT my_index/_doc/1
{
"firstName":"Ruan",
"isVIP":"true"
}

GET my_index/_mapping

{
"my_index" : {
"mappings" : {
"dynamic_templates" : [
{
"string_as_boolean" : {
"match" : "is*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "boolean"
}
}
},
{
"string_as_keywords" : {
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
}
],
"properties" : {
"firstName" : {
"type" : "keyword"
},
"isVIP" : {
"type" : "boolean"
}
}
}
}
}