ElasticSearch索引

半新半旧

已于 2024-06-15 15:34:20 修改

阅读量1.6k

点赞数 49

分类专栏： Elasticsearch 学习笔记文章标签： elasticsearch 搜索引擎

于 2024-06-14 20:52:41 首次发布

本文链接：https://blog.csdn.net/wangqiang996/article/details/139689883

版权

Elasticsearch 学习笔记专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Mappings(映射)

ES会自动帮我们定义表结构。但是很多时候我们需要自己更具业务场景制定表结构。ES中的映射用来定义一个文档及其包含的字段如何存储和索引的的过程，我们可以使用映射来定义。

哪些字符串应该被视为全文字段
哪些字段包含数字、日期或者地理位置
定义日期的格式
自定义的规则，用来控制动态添加字段的映射

映射类型

每个索引都有一个映射类型，它决定了文档将如何被索引

映射类型有：

元字段(meta_fields):元字段用于自定义如何处理文档关联的元数据，例如包括文档的_index、_type、_id和_source字段
字段或属性(fields or properties)：映射类型包括与文档相关的字段或者属性的列表

字段的数据类型：

简单类型：文本(text)、关键字(keyword)、日期(date)、整形(long)、双精度(double)、布尔(bool)或ip
可以是支持JSON的层次结构性质的类型，如对象或嵌套
或者一种特殊类型，如geo_point、geo_shape或completion

为了不同的目的，以不同的方式索引相同的字段通常是有用的。例如，字符串字段可以作为全文搜索的文本字段进行索引，也可以作为排序或聚合的关键字字段进行索引。

映射约束

在索引中定义太多的字段有可能导致映射爆炸，因为这可能导致内存不足以及难以恢复的情况，为此，我们可以手动或动态的创建字段映射的数量

index.mapping.total_fields.limit：索引中的最大字段数。字段和对象映射以及字段别名都计入此限制。默认值为1000。
index.mapping.depth.limit：字段的最大深度，以内部对象的数量来衡量。例如，如果所有字段都在根对象级别定义，则深度为1.如果有一个子对象映射，则深度为2，等等。默认值为20。
index.mapping.nested_fields.limit：索引中嵌套字段的最大数量，默认为50.索引1个包含100个嵌套字段的文档实际上索引101个文档，因为每个嵌套文档都被索引为单独的隐藏文档。

简单的映射示例

PUT mapping_test1
{
  "mappings": {
    "test1":{
      "properties":{
        "name":{"type": "text"},
        "age":{"type":"long"}
      }
    }
  }
}

上例中，我们在创建索引mapping_test1的过程中，为该索引定制化类型(设计表结构)，添加一个映射类型test1，指定字段或者属性都在properties内完成。

通过GET查看

GET mapping_test1
{
  "mapping_test1" : {
    "aliases" : { },
    "mappings" : {
      "test1" : {
        "properties" : {
          "age" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1550469220778",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "7I_m_ULRRXGzWcvhIZoxnQ",
        "version" : {
          "created" : "6050499"
        },
        "provided_name" : "mapping_test1"
      }
    }
  }
}

为索引添加数据

put mapping_test1/test1/1
{
  "name":"张开嘴",
  "age":16
}

查询：

GET mapping_test1/test1/_search
{
  "query": {
    "match": {
      "age": 16
    }
  }
}

返回示例

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "mapping_test1",
        "_type" : "test1",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "张开嘴",
          "age" : 16
        }
      }
    ]
  }
}

Mapping之dynamic的三种状态

前言

mapping可以分为动态映射(dynamic mapping)和静态映射(explicit mapping)和用严格映射(strict mapping)，具体由dynamic属性控制。

动态映射(dynamic:true)

现在有这样一个索引：

PUT m1
{
  "mappings": {
    "doc":{
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
    }
  }
}

查看mappings信息

{
  "m1" : {
    "mappings" : {
      "doc" : {
        "dynamic" : "true",
        "properties" : {
          "age" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

添加数据，并且新增一个sex字段

PUT m1/doc/1
{
  "name": "小黑",
  "age": 18,
  "sex": "不详"
}

查询新字段没有问题

GET m1/doc/_search
{
  "query": {
    "match": {
      "sex": "不详"
    }
  }
}

返回结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "m1",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "name" : "小黑",
          "age" : 18,
          "sex" : "不详"
        }
      }
    ]
  }
}

从上看出，没有影响，这是因为ElasticSearch遇到文档中以前未遇到的字段，它用动态映射来确定字段的数据类型并自动把新的字段添加到类型映射，查看mappings如下

{
  "m1" : {
    "mappings" : {
      "doc" : {
        "dynamic" : "true",
        "properties" : {
          "age" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text"
          },
          "sex" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

通过案例我们发现ElasticSearch帮我们新增了一个sex映射。这是因为dunamic属性为true的时候，ElasticSearch默认是允许添加新的字段，即dynamic:true

等同于创建索引的时候如下：

PUT m1
{
  "mappings": {
    "doc":{
      "dynamic":true,
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
    }
  }
}

注意：mappings一旦创建，则无法修改，因为Lucene生成倒排索引后就不能修改了。

静态映射(dynamic:false)

创建索引：设置dynamic:true

PUT m2
{
  "mappings": {
    "doc":{
      "dynamic":false,
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
    }
  }
}

测试一下false和true有什么区别:

PUT m2/doc/1
{
  "name": "小黑",
  "age":18
}
PUT m2/doc/2
{
  "name": "小白",
  "age": 16,
  "sex": "不详"
}

以sex为条件查询

GET m2/doc/_search
{
  "query": {
    "match": {
      "sex": "不详"
    }
  }
}

结果如下：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

结果是空的，可以查看该索引的mappings信息

{
  "m2" : {
    "mappings" : {
      "doc" : {
        "dynamic" : "false",
        "properties" : {
          "age" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

从中可以看到elasticsearch并没有为新的sex建立映射关系，所以查询不到。

当elasticsearch察觉有新增字段时，因为dynamic:false的关系，会忽略该字段，但是任然会存储该字段。

严格模式(dynamic:strict)

创建mappings，将dynamic:strict

PUT m3
{
  "mappings": {
    "doc": {
      "dynamic": "strict", 
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
    }
  }
}

添加数据

PUT m3/doc/1
{
  "name": "小黑",
  "age": 18
}
PUT m3/doc/2
{
  "name": "小白",
  "age": 18,
  "sex": "不详"
}

在添加第二篇文档时遇到下面报错信息

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [sex] within [doc] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [sex] within [doc] is not allowed"
  },
  "status": 400
}

总结

动态映射(dynamic:true)：动态添加新的字段
静态映射(dynamic:false)：忽略新的字段，在原有基础上，不会主动的添加新的映射关系，只作为查询结果出现在查询中
严格模式(dynamic:strict)：遇到新的字段，抛出异常
一般静态映射用的比较多，如果数据一直不变，则可以使用strict

Mappings的其他设置

index

创建一个mappings，在创建索引时，为每个属性添加一个index属性

PUT m4
{
  "mappings": {
    "doc": {
      "dynamic": false,
      "properties": {
        "name": {
          "type": "text",
          "index": true
        },
        "age": {
          "type": "long",
          "index": false
        }
      }
    }
  }
}

添加文档

PUT m4/doc/1
{
  "name": "小黑",
  "age": 18
}

再看查询效果

GET m4/doc/_search
{
  "query": {
    "match": {
      "name": "小黑"
    }
  }
}

GET m4/doc/_search
{
  "query": {
    "match": {
      "age": 18
    }
  }
}

以name查询没问题，但是，以age作为查询条件就出问题

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n  \"match\" : {\n    \"age\" : {\n      \"query\" : 18,\n      \"operator\" : \"OR\",\n      \"prefix_length\" : 0,\n      \"max_expansions\" : 50,\n      \"fuzzy_transpositions\" : true,\n      \"lenient\" : false,\n      \"zero_terms_query\" : \"NONE\",\n      \"auto_generate_synonyms_phrase_query\" : true,\n      \"boost\" : 1.0\n    }\n  }\n}",
        "index_uuid": "GHBPeT5pRnSi3g6DkpIkow",
        "index": "m4"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "m4",
        "node": "dhkqLLTsRemm7qEgRdpvTg",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"match\" : {\n    \"age\" : {\n      \"query\" : 18,\n      \"operator\" : \"OR\",\n      \"prefix_length\" : 0,\n      \"max_expansions\" : 50,\n      \"fuzzy_transpositions\" : true,\n      \"lenient\" : false,\n      \"zero_terms_query\" : \"NONE\",\n      \"auto_generate_synonyms_phrase_query\" : true,\n      \"boost\" : 1.0\n    }\n  }\n}",
          "index_uuid": "GHBPeT5pRnSi3g6DkpIkow",
          "index": "m4",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot search on field [age] since it is not indexed."
          }
        }
      }
    ]
  },
  "status": 400
}

总结：

index属性默认为true，如果该属性设置为false，那么elasticsearch不会为该属性创建索引，即无法作为主查询条件

copy_to

该属性允许我们将多个字段的值赋值到组字段中，然后将组字段作为单个字段进行查询

PUT m5
{
  "mappings": {
    "doc": {
      "dynamic":false,
      "properties": {
        "first_name":{
          "type": "text",
          "copy_to": "full_name"
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name"
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT m5/doc/1
{
  "first_name":"tom",
  "last_name":"ben"
}
PUT m5/doc/2
{
  "first_name":"john",
  "last_name":"smith"
}

GET m5/doc/_search
{
  "query": {
    "match": {
      "first_name": "tom"
    }
  }
}

GET m5/doc/_search
{
  "query": {
    "match": {
      "full_name": "tom"
    }
  }
}

案例中，我们将first_name和last_name都复制到full_name中，并且使用full_name查询页返回了结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "m5",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "tom",
          "last_name" : "ben"
        }
      }
    ]
  }
}

查询tom或者smit

GET m5/doc/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "tom smith",
        "operator": "or"
      }
    }
  }
}

将查询条件以空格隔开并封装在query内，operator参数为多个条件的查询关系也可以是and，也可以简写

GET m5/doc/_search
{
  "query": {
    "match": {
      "full_name": "tom smith"
    }
  }
}

copy_to还支持将相同的属性值赋值给不同的字段

PUT m6
{
  "mappings": {
    "doc": {
      "dynamic":false,
      "properties": {
        "first_name":{
          "type": "text",
          "copy_to": "full_name"
        },
        "last_name": {
          "type": "text",
          "copy_to": ["field1", "field2"]
        },
        "field1": {
          "type": "text"
        },
        "field2": {
          "type": "text"
        }
      }
    }
  }
}


PUT m6/doc/1
{
  "first_name":"tom",
  "last_name":"ben"
}
PUT m6/doc/2
{
  "first_name":"john",
  "last_name":"smith"
}

案例中，只需要将copy_to的字段以数组的形式封装即可，无论是通过field1还是field2都可以查询

总结：

copy_to复制的属性值而不是属性
copy_to如果要应用于聚合请将filddata设置为true
如果要将属性值赋值给多个字段，请用数组，如：copy_to:['field1', 'field2']

对象属性

案例：

PUT m7/doc/1
{
  "name":"tom",
  "age":18,
  "info":{
    "addr":"北京",
    "tel":"10010"
  }
}

设计mappings结构

PUT m7
{
  "mappings": {
    "doc": {
      "dynamic": false,
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "text"
        },
        "info": {
          "properties": {
            "addr": {
              "type": "text"
            },
            "tel": {
              "type" : "text"
            }
          }
        }
      }
    }
  }
}

以info中的tel字段作为条件查询

GET mapping_test9/doc/_search
{
  "query": {
    "match": {
      "info.tel": "10086"
    }
  }
}

settings设置(设置主、复制分片)

在创建一个索引的时候，我们可以在settings中指定分片信息

PUT s1
{
  "mappings": {
    "doc": {
      "properties": {
        "name": {
          "type": "text"
        }
      }
    }
  }, 
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 5
  }
}

number_of_shards是主分片的数量(每个索引默认是5个主分片)，而number_of_replicas是赋值分片，默认一个主分片搭配一个复制分片。

Mappings parameters

ignore_above

长度超过ignore_above设置的字符串将不会被索引或存储。

PUT w1
{
  "mappings": {
    "doc":{
      "properties":{
        "t1":{
          "type":"keyword",
          "ignore_above": 5
        },
        "t2":{
          "type":"keyword",
          "ignore_above": 10   ①
        }
      }
    }
  }
}
PUT w1/doc/1
{
  "t1":"elk",          ②
  "t2":"elasticsearch"  ③
}
GET w1/doc/_search   ④
{
  "query":{
    "term": {
      "t1": "elk"
    }
  }
}

GET w1/doc/_search  ⑤
{
  "query": {
    "term": {
      "t2": "elasticsearch"
    }
  }
}

①：该字段将忽略任何超过10个字符串的字符串

②：此文档已成功建立索引，也就是说能被查询，并且有结果返回

③：该字段将不会建立索引，也就是说，以该字段作为查询条件，将不会有返回结果

④：有结果返回

⑤：将不会有结果返回

该参数对于防止Lucene 的术语字节长度限制也很有作用，限制长度是32766

注意，该ignore_above 设置可以利用现有领域进行更新https://www.elastic.co/guide/en/elasticsearch/reference/7.0/indices-put-mapping.html

对于值ignore_above是字符串，但Lucene的字节数为单位。如果使用带有许多ASCLL字符的UTF-8文本，需要设置限制，32766/4=8191因为utf-8字符最多可占用4个字节

如果我们观察上述示例中，我们可以看到在设置映射类型时，字段的类型是keyword，也就是说ignore_above参数仅针对keyword类型有用。

如果字符串的类型是text时也能用ignore_above，但是需要特殊设置

PUT w2
{
  "mappings": {
    "doc":{
      "properties":{
        "t1":{
          "type":"keyword",
          "ignore_above":5
        },
        "t2":{
          "type":"text",
          "fields":{
            "keyword":{
              "type":"keyword",
              "ignore_above": 10
            }
          }
        }
      }
    }
  }
}

PUT w2/doc/1
{
  "t1":"beautiful",
  "t2":"beautiful girl"
}

GET w2/doc/_search  ①
{
  "query": {
    "term": {
      "t1": {
        "value": "beautiful"
      }
    }
  }
}

GET w2/doc/_search  ②
{
  "query": {
    "term": {
      "t2": "beautiful"
    }
  }
}

1、不会有返回结果

2、有返回结果，因为字段类型是text

但是当字段类型设置为text之后，ignore_above参数的限制就失效了。

模板索引

前言

模版索引允许我们定义在创建新索引时自动应用的模板。模板包括设置和映射，以及一个简单的模式模板，该模板控制是否应该将模板应用于新索引。

为什么需要索引模板？

在开发中，elasticsearch很大一部分工作是用来处理日志信息的，比如对日志处理策略是以创建每天的日志索引，并且每天的索引映射类型和配置信息都是一样的，只是索引名称改变了。如果手动的创建每天的索引，将会是一件很麻烦的事情。为了解决类似的问题，elasticsearch提供了预先定义的模板进行索引创建，这个模板称作为Index Template。通过索引模板可以让类似的索引重用一个模版。

模板只在创建索引时应用。更改模板不会对现有索引产生影响，elasticsearch提供了预先自定义的模板进行索引创建，这个模板称作为Index Template。通过索引模板可以让类似的索引重用同一个模版。

模板只是在创建索引时应用。更改模版不会对现有索引产生影响。当使用create index API时，作为create index调用的一部分的设置/映射将优先于模版中定义的任何匹配设置/映射。

创建索引模板

PUT _template/2019
{
  "index_patterns": ["20*", "product1*"],   ①
  "settings":{   ②
    "number_of_shards": 2,
    "number_of_replicas": 1
  },
  "mappings":{  ③
    "doc":{
      "properties":{
        "ip":{
          "type":"keyword"
        },
        "method":{
          "type": "keyword"
        }
      }
    }
  }
}

index_patterns是索引模式，指当创建以20和product1开头的索引时，使用该索引模板。
在settings设置中，我们自定义为该索引分配3个主分片，副本默认一个。
mappings中指定映射关系。

查看索引模版

查看刚才创建的索引模版

GET _template/2019

我们还可以通过使用通配符来查询多个模版

GET /_template/temp*
GET /_template/template_1,template_2

查看所有可用的模板列表

GET /_template

查询某个模板是否存在

HEAD _template/2019

索引模板的使用

基于上面创建的索引模板，创建索引并添加数据

PUT 20190101/doc/1
{
  "ip": "127.0.0.1",
  "method":"GET"
}

PUT 20190102/doc/2
{
  "ip":"192.168.1.1",
  "method":"POST"
}

PUT product1_log/doc/1
{
  "ip":"127.0.0.1",
  "method":"GET"
}

上例会按照模板自动生成3个索引20190101、20190102和product1_log

查询信息

GET 2019*/doc/_search
{
  "query": {
    "match_all": {}
  }
}

查看索引信息

GET 20190101
# 结果如下
{
  "20190101" : {
    "aliases" : { },
    "mappings" : {
      "doc" : {
        "properties" : {
          "ip" : {
            "type" : "keyword"
          },
          "method" : {
            "type" : "keyword"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1557889480975",
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "uuid" : "FEuyT5aoTnGP3k_7hCtQFA",
        "version" : {
          "created" : "6050499"
        },
        "provided_name" : "20190101"
      }
    }
  }
}

多模板匹配

多个索引模板可以应用于同一个索引，顺序由order参数的大小决定。比如现在有着一样的一个索引模板：

PUT _template/2018_1
{
  "index_patterns": ["2018*"],
  "order":0,
  "settings":{
    "number_of_shards": 2
  }
}

上述索引模板将应用于以2018开头的所有索引，设置索引的主分片为2，order参数为0

PUT 2018010101/doc/1
{
  "method":"GET"
}
GET 2018010101/_settings
# 结果如下
{
  "2018010101" : {
    "settings" : {
      "index" : {
        "creation_date" : "1557900456281",
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "uuid" : "P53RDmT6RRCDY2DlHtpymg",
        "version" : {
          "created" : "6050499"
        },
        "provided_name" : "2018010101"
      }
    }
  }
}

从上可以看到2018010101索引的主分片数量是2，已经成功应用了索引模板。

接下来在创建一个索引模板

PUT _template/2018_2
{
  "index_patterns": ["201802*"],
  "order":1,
  "settings":{
    "number_of_shards": 3
  }
}

上述模板应用于以201802开头的索引，只是主分片的数量为3，order参数为1。

我们在创建一个索引

PUT 2018010201/doc/1
{
  "method": "POST"
}
GET 20180201/_settings
# 结果如下
{
  "20180201" : {
    "settings" : {
      "index" : {
        "creation_date" : "1557901225020",
        "number_of_shards" : "3",
        "number_of_replicas" : "1",
        "uuid" : "-B8_ZiK7QgesmGSzD_8xlQ",
        "version" : {
          "created" : "6050499"
        },
        "provided_name" : "20180201"
      }
    }
  }
}