1、ElasticSearch实战配置

最新推荐文章于 2023-01-09 17:01:27 发布

贝壳里的沙

最新推荐文章于 2023-01-09 17:01:27 发布

阅读量788

点赞数

分类专栏：大数据文章标签： es ElasticSearch

本文链接：https://blog.csdn.net/lixiang987654321/article/details/100983439

版权

大数据专栏收录该内容

25 篇文章 0 订阅

订阅专栏

一、重要问题说明

在Elasticsearch 6.0.0或更高版本中创建的索引可能只包含单个mapping type。在具有多种映射类型的5.x中创建的索引将继续像以前一样在Elasticsearch 6.x中运行。映射类型将在Elasticsearch 7.0.0中完全删除。

官方连接说明：https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

在6.0.0或者更高版本中仅支持单Mapping。对于5.X版本中多个Mapping依旧可以使用。计划在7.0.0中完全移除Mapping【移除是符合情理的，你看现在就只支持单Mapping，跟直接访问index没啥区别了】

为什么移除映射类型？

Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”.

In an Elasticsearch index, fields that have the same name in different mapping types are backed by the same Lucene field internally. In other words, using the example above, the user_namefield in the user type is stored in exactly the same field as the user_name field in the tweet type, and both user_name fields must have the same mapping (definition) in both types.

On top of that, storing different entities that have few or no fields in common in the same index leads to sparse data and interferes with Lucene’s ability to compress documents efficiently.

This was a bad analogy that led to incorrect assumptions. In an SQL database, tables are independent of each other. The columns in one table have no bearing on columns with the same name in another table. This is not the case for fields in a mapping type.

This can lead to frustration when, for example, you want deleted to be a date field in one type and a boolean field in another type in the same index.

For these reasons, we have decided to remove the concept of mapping types from Elasticsearch.

解决方案

（1）每一个文档一个类型

（2）自定义类型属性

PUT twitter
{
"mappings": {
"user": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
},
"tweet": {
"properties": {
"content": { "type": "text" },
"user_name": { "type": "keyword" },
"tweeted_at": { "type": "date" }
}
}
}}

（3）将所有类型统一到同一个类型中

PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"type": { "type": "keyword" },
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" },
"content": { "type": "text" },
"tweeted_at": { "type": "date" }
}
}
}}

（4）基于parent/child但无映射类型的模式

二、索引配置

为了在后续应用中，能够定期清理数据，释放空间，我们必须以别名的方式访问索引，你的应用就从旧索引迁移到了新的，而没有停机时间.

索引指向哪些别名

GET /my_index_2018/_alias/*

或者检测这个别名指向哪一个索引

GET /*/_alias/my_index_latest

为索引创建别名

curl -i -H "Content-Type: application/json" -XPOST 'http://192.168.12.150:9200/_aliases?pretty' -d '
{
  "actions": [
    {"remove": { "index": "my_index_2018", "alias": "my_index_latest" }},
    {"add":    { "index": "my_index_2019", "alias": "my_index_latest" }}
  ]
}'
或者
PUT /my_index_2019/_alias/my_index_latest

删除旧的索引

DELETE http://192.168.12.150:9200/my_index_2018

使用别名访问

GET /my_index_latest/_search?q=user:kimchy&routing=

三、映射类型配置

es支持的字符串类型

类型	范围或说明
byte	-128~127
short	-32768~32767
integer	-2^31~2^31-1
long	-2^63~2^63-1
string	sting，es5.x之前版本，当前已经废弃
浮点数	double，float ，half_float，scaled_float
日期类型	date 内部会将日期数据转换为UTC，并存储为milliseconds-since-the-epoch的long型整数，默认为 strict_date_optional_time\|\|epoch_millis格式，如"format": "strict_date_optional_time\|\|epoch_millis"，当然直接设置format为你想要的格式，比如“yyyy-MM-dd HH:mm:ss” 然后存储的时候，指定格式，并且 Mapping 也是指定相同的format
布尔类型	boolean
二进制	binary
地址	ip，存储ipv4或ipv6
text	text,es5.x由string改为text或keyword，不用于排序和聚合
keyword	适合结构化的数据，email地址、主机名、状态码和标签，如果字段需要进行过滤(比如查找已发布博客中status属性为published的文章)、排序、聚合。keyword类型的字段只能通过精确值搜索到。只能通过精确值搜索到

查看类型映射

curl -i -XGET 'http://192.168.12.150:9200/website/_mapping/blog/?pretty'

设置索引的类型映射

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/my_index?pretty' -d '
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s"
    "analysis": {
        "char_filter": {
            "&_to_and": {
                "type":     "mapping",
                "mappings": ["&=> and "]
        }},
        "filter": {
            "my_stopwords": {
                "type":       "stop",
                "stopwords": [ "the", "a" ]
        }},
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "char_filter":  [ "html_strip", "&_to_and" ],
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords" ]
        }}
    }
  },
  "mappings": {
    "my_type" : {
      "dynamic" : "true",
      "properties" : {
        "tweet" : {
          "type" :    "text",
          "analyzer": "english",
          "copy_to" : "my_all"
        },
        "date" : {
          "type" :   "date",
          "index": false
        },
        "name" : {
          "type": "object",
          "properties": {
             "first_name": {
               "type" :   "text",
               "copy_to" : "my_all"
              },
             "last_name": {
               "type" :   "text",
               "copy_to" : "my_all"
             }
          }
        },
        "user_id" : {
          "type" :   "long"
        },
        "description" : {
          "type" : "text",
          "analyzer": "my_analyzer",
          "index":   "true",
          "fields": {
            "keyword": {
              "type" : "keyword",
              "ignore_above": 256
            }
          }
        },
        "my_all": {
          "type" : "text"
        }
      }
    }
  }
}'

说明：

（1）analyzer指定字段的索引分析器和查询分析器（可以单独设置index-analyzer和search-analyzer），index表示该字段是否用于是否构建倒排索引，false不记录，即不可搜索；

（2）copy_to是自定义all字段，默认6.0之后_all字段是被禁用的，需要自行开启才可以使用，建议使用自定义all字段

（3）dynamic设置未来如何处理新增字段，true默认添加，false忽略，strict遇到为止错误抛异常（只能有声明的字段，不允许新增），将 dynamic 设置成 false 完全不会修改 _source 字段的内容。_source 将仍旧保持你索引时的完整 JSON 文档。然而，没有被添加到映射的未知字段将不可被搜索。

（4）description字段可以映射为text字段用于全文本搜索，也可以映射为keyword字段用于排序或聚合。ignore_above表示忽略长度超过指定的值则忽略搜索和索引，fields>keyword表示自定义字段别名。如下所示

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": {
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york"
    }
  },
  "sort": {
    "city.raw": "asc"
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw"
      }
    }
  }
}

（5）Elesticsearch中，这种写入打开一个新段的轻量级过程，叫做refresh。默认情况下，每个分片每秒自动刷新一次。这就是为什么说Elasticsearch是近实时的搜索了：文档的改动不会立即被搜索，但是会在一秒内可见。配置refresh_interval减少刷新的频率。

具体mapping属性设置可参考 https://blog.csdn.net/qq_34646817/article/details/82585517

设置中文分析映射

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/my_index?pretty' -d '
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
        "char_filter": {
            "my_html": {
                "type": "html_strip",
                "escaped_tags": ["b"]
                "read_ahead": 1024
        }},
        "filter": {
            "myTokenFilter1": {
                "type": "stop",
                "stopwords": [ "嗯", "啊", "啦", "！", "噢", "耶" ]
            },
            "myTokenFilter2": {
                "type":  "length",
                "min": 0,
                "max": 2000
            },
        },
        "tokenizer" : {
            "myTokenizer1": {
                "type": "standard",
                "max_token_length": 900
            }
        }
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "char_filter":  ["my_html"],
                "tokenizer":    "myTokenizer1",
                "filter":       ["lowercase", "myTokenFilter1", "myTokenFilter2"]
        }}
    }
  },
  "mappings": {
    "my_type" : {
      "properties" : {
        "tweet" : {
          "type" :    "text",
          "analyzer": "english"
        },
        "date" : {
          "type" :   "date"
        },
        "name" : {
          "type" :   "text"
        },
        "user_id" : {
          "type" :   "long"
        },
        "description" : {
          "type" : "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}'

这里需要说明一下：

index定义字段的分析类型以及检索方式，控制字段值是否被索引.他可以设置成 true 或者 false。没有被索引的字段将无法搜索

可以为字段的索引和搜索设置相同的分析器：analyzer，或者说分别为索引和搜索设置分析器：indexAnalyzer和searchAnalyzer（或search_analyzer），操作不存在的索引使用put，操作已经在的索引使用post方式！

如果更改了mapping分词器,需要重新索引数据才能生效，因为倒排索引中的数据是索引时由分词器来处理的,如果分词器有变化,那么搜索时query关键词即使和doc中关键词相同,但是因为分词器的原因,分出来的词会出现不匹配的情况,因此当mapping或者分词器字典同义词词典等发生变化时,需要reindex索引数据。

重建时相当痛苦的，如果没有很好的基础，服务可能中断，当数据量非常大时，重建恢复时间可能很长，甚至在重建过程中出错等等。所以，没有万不得已的情况下还是尽量避免重建索引。Teambition 重建索引的理由略，因为这并不重要。

正在运行的服务必须使用别名(alias)来访问索引(index)，原因很简单，搜索服务最终要使用重建的索引，原始的索引将被删除。如果你的服务正在直接使用索引名，在重建前创建别名，更新服务。

curl -i -H "Content-Type: application/json" -XPOST 'http://192.168.12.150:9200/_aliases?pretty' -d '
{
  "actions": [
    {"remove": { "index": "my_index_2018", "alias": "my_index_latest" }},
    {"add":    { "index": "my_index_2019", "alias": "my_index_latest" }}
  ]
}'

【注意】：

(1) 在创建索引的时候需要将数据结构完整确定下来，于此同时索引的设定和很多固定配置将用不能改变。

如果当需要改变数据结构时，就需要重新建立索引才能生效！

(2) 正在运行的服务必须使用别名(alias)来访问索引(index)，原因很简单，搜索服务最终要使用重建的索引，

原始的索引将被删除。如果你的服务正在直接使用索引名，在重建前创建别名，更新服务.

(3) 当只是改变 mapping 数据结构时，可以仅仅使用 reindex api即可,例如：删除字段，更新字段分词方式等。

(4) 当需要写入新的字段，新的字段是由老的字段计算得到时，可以使用script参数。

索引重建步骤：

假定之前为v1版本，需要更新到v1版本，v1版本的索引为goods_v1，别名为goods，则需要进行如下步骤重建索引。

（1）新建v2版本的索引

curl -XPUT --header 'Accept: application/json' 'http://es.zhuma.com/goods_v2' -d '{
        "mappings": {
            "default": {
                "dynamic": "false",
                "properties": {
                    "id": {
                        "type": "long"
                    },
                    "name": {
                        "type": "text",
                        "analyzer": "ik_max_word"
                    },
                    "desc": {
                        "type": "text",
                        "analyzer": "ik_max_word"
                    },
                    "quantity": {
                        "type": "integer"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "number_of_shards": "5",
                "provided_name": "goods_v1",
                "max_result_window": "50000",
                "creation_date": "1521437794175",
                "analysis": {
                    "analyzer": {
                        "ik": {
                            "tokenizer": "ik_max_word"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "L1FLoPmAQNiwJkpdH1KJwA",
                "version": {
                    "created": "5060399"
                }
            }
        }
}'

（2）现有索引数据导入目标索引

curl -XPOST --header 'Accept: application/json' 'http://es.zhuma.com/_reindex' -d '{
  "conflicts": "proceed",
  "source": {
    "index": "goods_v1"
  },
  "dest": {
    "index": "goods_v2",
    "op_type": "create",
    "version_type": "external"
  }
}'

备注：

reindex会将一个索引的数据复制到另一个已存在的索引，但是并不会复制原索引的mapping（映射）、shard（分片）、replicas（副本）等配置信息，所以这也是为什么我们在第一步中创建了和源索引结构基本相同的目标索引的原因。
设置conflicts为proceed代表当复制数据时发生版本冲突时继续执行（默认设置在版本冲突时会中止了reindex进程）。
设置op_type为create是指在目标索引中创建丢失的文档，所有现有文件将导致版本冲突。
设置version_type为external就是指数据从源索引拷贝到目标索引的时候会同时拷贝上版本号字段，并更新目标索引中比源索引中更旧版本的文档。
这里说下该导入功能执行效率，线上一个索引的数据量在8w左右用时3秒多（其实跟你的索引的一个文档数据量也有关系，这里仅仅给个参考点）。

（3）设置索引的别名

将v1的别名删除并指向v2

curl -XPOST --header 'Accept: application/json' 'http://es.zhuma.com/_aliases' -d '{
    "actions": [
        {"remove": {"index": "goods_v1", "alias": "goods"}},
        { "add": {"index": "goods_v2",  "alias": "goods"}}
    ]
}'

为确保数据正常，可以再次将v1数据导出到v2

（4）删除索引

注意，等到确定数据导出完成后，才能删除旧索引

curl -XDELETE --header 'Accept: application/json' 'http://es.zhuma.com/goods_v1'

四、分析器设置

当我们索引(index)一个文档，全文字段会被分析为单独的词来创建倒排索引。不过，当我们在全文字段搜索(search)时，我们要让查询字符串经过同样的分析流程处理，以确保这些词在索引中存在。

全文查询我们将在稍后讨论，理解每个字段是如何定义的，这样才可以让它们做正确的事：

当你查询全文(full text)字段，查询将使用相同的分析器来分析查询字符串，以产生正确的词列表。
当你查询一个确切值(exact value)字段，查询将不分析查询字符串，但是你可以自己指定。

curl -i -H "Content-Type: application/json" -XPOST 'http://192.168.12.150:9200/_analyze?pretty' -d '
{
    "analyzer": "chinese",
    "text":"认识你，很高兴！"
}
'

使用分析器

curl -H "Content-Type: application/json" -i -XPUT 'http://192.168.12.150:9200/gb?pretty' -d '
{
  "mappings": {
    "tweet" : {
      "properties" : {
        "tweet" : {
          "type" :    "text",
          "analyzer": "chinese"
        },
        "date" : {
          "type" :   "date"
        },
        "name" : {
          "type" :   "text"
        },
        "user_id" : {
          "type" :   "long"
        }
      }
    }
  }
}
'

五、使用自定义模板

在实战过程中，我们可能需要根据年限或日期动态生成滚动索引比如，我们会创建test0,test1,test2,,,等这样的索引库。那么，我们创建模板索引库指定是test*。

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/_template/template_1?pretty' -d '
{
  "order": 0,
  "index_patterns": "dys*",
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "analysis": {
        "char_filter": {
            "&_to_and": {
                "type":     "mapping",
                "mappings": ["&=> and "]
            }
        },
        "filter": {
            "my_stopwords": {
                "type":       "stop",
                "stopwords": [ "the", "a" ]
            }
        },
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "char_filter":  [ "html_strip", "&_to_and" ],
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords" ]
            }
        }
    }
  },
  "mappings": {
    "my_type" : {
      "dynamic" : "true",
      "_source" : { "enabled" : true},
      "properties" : {
          "tweet" : {
              "type" :    "text",
              "analyzer": "english",
              "copy_to" : "my_all"
          },
          "date" : {
              "type" :   "date",
              "index": "false"
          },
          "name" : {
              "type": "object",
              "properties": {
                "first_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                },
                "last_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                }
              }
        },
        "user_id" : {
           "type" :   "long"
        },
        "description" : {
           "type" : "text",
           "analyzer": "my_analyzer",
           "index":   "true",
           "fields": {
             "keyword": {
               "type" : "keyword",
               "ignore_above": 256
             }
           }
        },
        "my_all": {
           "type" : "text"
        }
      }
    }
  }
}'

或者将分片包含在index中（可以不写）

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/_template/template_1?pretty' -d '
{
  "order": 0,
  "index_patterns": "dys*",
  "settings": {
    "index":{
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "refresh_interval": "30s",
      "analysis": {
        "char_filter": {
            "&_to_and": {
                "type":     "mapping",
                "mappings": ["&=> and "]
            }
        },
        "filter": {
            "my_stopwords": {
                "type":       "stop",
                "stopwords": [ "the", "a" ]
            }
        },
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "char_filter":  [ "html_strip", "&_to_and" ],
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords" ]
            }
        }
      }
    }
  },
  "mappings": {
    "my_type" : {
      "dynamic" : "true",
      "_source" : { "enabled" : true},
      "properties" : {
          "tweet" : {
              "type" :    "text",
              "analyzer": "english",
              "copy_to" : "my_all"
          },
          "date" : {
              "type" :   "date",
              "index": "false"
          },
          "name" : {
              "type": "object",
              "properties": {
                "first_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                },
                "last_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                }
              }
        },
        "user_id" : {
           "type" :   "long"
        },
        "description" : {
           "type" : "text",
           "analyzer": "my_analyzer",
           "index":   "true",
           "fields": {
             "keyword": {
               "type" : "keyword",
               "ignore_above": 256
             }
           }
        },
        "my_all": {
           "type" : "text"
        }
      }
    }
  }
}'

删除模板

curl -i -XDELETE 'http://192.168.12.150:9200/_template/template_1?pretty'

获取模板

curl -i -XGET 'http://192.168.12.150:9200/_template/template_1?pretty'
GET /_template/temp* 
GET /_template/template_1,template_2
GET /_template

六、滚动索引（rollover index）-高效管理时序数据

如果索引中的数据大于规则中指定的数据量或者时间过时，新的索引将被创建，将索引别名转向新索引。索引名称为 logs-000002，并根据规则套用 Index Template，同时别名 logs_write 也将被变更到 logs-000002。

rollover索引命名

（1）索引按照【”-”+数字】的形式结尾），新创建的索引数字+1，数字为6位，不够前面以0补齐（如上rollover命名）

（2）索引不符合命名规范，rollover时需要自定义新索引名称【POST /write_test_indices/_rollover/新索引名】

（3）按照日期形式命名

使用注意事项

（1）别名和索引名不能重复

（2）rollover操作的别名是能够进行写操作的别名，只指向一个索引（最新索引）；

（3）6.5以后的版本支持is_write_index，默认为null，别名也只指向最新索引当操作，is_write_index=true时，

执行rollover后，别名logs的写指向最新索引，查询指向所有rollover下的索引。

操作步骤

（1）先设置索引模板，我们使用上面的索引模板即可

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/_template/template_1?pretty' -d '
{
  "order": 0,
  "index_patterns": "dys*",
  "settings": {
    "index":{
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "refresh_interval": "30s",
      "analysis": {
        "char_filter": {
            "&_to_and": {
                "type":     "mapping",
                "mappings": ["&=> and "]
            }
        },
        "filter": {
            "my_stopwords": {
                "type":       "stop",
                "stopwords": [ "the", "a" ]
            }
        },
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "char_filter":  [ "html_strip", "&_to_and" ],
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords" ]
            }
        }
      }
    }
  },
  "mappings": {
    "my_type" : {
      "dynamic" : "true",
      "_source" : { "enabled" : false},
      "properties" : {
          "tweet" : {
              "type" :    "text",
              "analyzer": "english",
              "copy_to" : "my_all"
          },
          "date" : {
              "type" :   "date",
              "index": "false"
          },
          "name" : {
              "type": "object",
              "properties": {
                "first_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                },
                "last_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                }
              }
        },
        "user_id" : {
           "type" :   "long"
        },
        "description" : {
           "type" : "text",
           "analyzer": "my_analyzer",
           "index":   "true",
           "fields": {
             "keyword": {
               "type" : "keyword",
               "ignore_above": 256
             }
           }
        },
        "my_all": {
           "type" : "text"
        }
      }
    }
  }
}'

（2）创建索引并设置索引别名

创建以整形序号结尾的滚动索引

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/dys-000001?pretty' -d '
{
  "aliases":{
      "dys_latest":{}
  }
}'
或者
curl -i -XPUT 'http://192.168.12.150:9200/dys-0001/_alias/dys_latest?pretty'

或者创建以日期为命名的滚动索引

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/%3Cdys-%7Bnow%2Fd%7D-1%3E?pretty' -d '
{
  "aliases":{
      "dys_latest":{}
  }
}'

如果创建以日期命令的滚动索引，计算格式：

<static_name{date_math_expr{date_format|time_zone}}>

位置	说明
static_name	是名称的 static text（静态文本）部分
date_math_expr	是动态计算日期的动态 date math 表达式

Expression	Resolves to
<logstash-{now/d}>	logstash-2024.03.22
<logstash-{now/M}>	logstash-2024.03.01
<logstash-{now/M{YYYY.MM}}>	logstash-2024.03
<logstash-{now/M-1M{YYYY.MM}}>	logstash-2024.02
<logstash-{now/d{YYYY.MM.dd\|+12:00}}>	logstash-2024.03.23

对应url的编码值如下

<    %3C
>    %3E
/    %2F
{    %7B
}    %7D
|    %7C
+    %2B
:    %3A
,    %2C

（3）设置滚动条件

curl -i -H "Content-Type: application/json" -XPOST 'http://192.168.12.150:9200/dys_latest/_rollover?pretty' -d '
{
  "conditions":{
      "max_age": "7d",
      "max_docs": 2,
      "max_size": "5g"
  }
}'

设置的滚动条件是超过7天，或者文档数目超过2个或者总大小超过5gb就自动滚动生成新索引并用别名指向改新索引。

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 278

{
  "acknowledged" : false,
  "shards_acknowledged" : false,
  "old_index" : "dys-0001",
  "new_index" : "dys-000002",
  "rolled_over" : false,
  "dry_run" : false,
  "conditions" : {
    "[max_size: 5gb]" : false,
    "[max_docs: 2]" : false,
    "[max_age: 7d]" : false
  }
}

可以看到下一个滚动的新索引名称为dys-000002

（4）插入测试数据

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/dys_latest/my_type/1?pretty' -d '
{
  "user_id": 1,
  "name": {
    "first_name": "xx",
    "last_name" : "li"
  },
  "date": "2019-10-01",
  "tweet": "this is my first article"
}'
{
  "_index" : "dys-0001",
  "_type" : "my_type",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/dys_latest/my_type/2?pretty' -d '
{
  "user_id": 2,
  "name": {
    "first_name": "san",
    "last_name" : "zhang"
  },
  "date": "2019-05-01",
  "tweet": "this is my second article"
}'
{
  "_index" : "dys-0001",
  "_type" : "my_type",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

（5）调用rollover API

首先强制刷新，使得写入数据持久化到磁盘（默认我们设置的刷新间隔为30s）

curl -i -XPOST 'http://192.168.12.150:9200/dys_latest/_refresh?pretty'

定时调用rollover API,当达到指定条件时，新的索引名字会自动+1，新插入数据就会存储到索引000002中去

curl -i -XPOST 'http://192.168.12.150:9200/dys_latest/_rollover?pretty'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "old_index" : "dys-000001",
  "new_index" : "dys-000002",
  "rolled_over" : true,
  "dry_run" : false,
  "conditions" : {
    "[max_size: 5gb]" : false,
    "[max_docs: 2]" : true,
    "[max_age: 7d]" : false
  }
}

可以看到，我们已经插入2条数据，调用rollover aip时候，发现索引滚动生效！

不插入数据，再次调用rollover，可以发现，索引并未更新

curl -i -H "Content-Type: application/json" -XPOST 'http://192.168.12.150:9200/dys_latest/_rollover?pretty' -d '
{
  "conditions":{
      "max_age": "7d",
      "max_docs": 2,
      "max_size": "5g"
  }
}
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 280

{
  "acknowledged" : false,
  "shards_acknowledged" : false,
  "old_index" : "dys-000002",
  "new_index" : "dys-000003",
  "rolled_over" : false,
  "dry_run" : false,
  "conditions" : {
    "[max_size: 5gb]" : false,
    "[max_docs: 2]" : false,
    "[max_age: 7d]" : false
  }
}

其他相关命令

删除所有相关索引

curl -i -XDELETE 'http://192.168.12.150:9200/dys*?pretty'

删除模板

curl -i -XDELETE 'http://192.168.12.150:9200/_template/template_1?pretty'

查询所有索引

curl -i -XGET 'http://192.168.12.150:9200/_cat/indices?pretty'

查看别名关联索引

curl -i -XGET 'http://192.168.12.150:9200/*/_alias/dys_latest?pretty'

查看索引关联别名

curl -i -XGET 'http://192.168.12.150:9200/dys_000001/_alias/*?pretty'

（6）日期格式滚动索引测试

创建所以模板

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/_template/template_1?pretty' -d '
{
  "order": 0,
  "index_patterns": "dys*",
  "settings": {
    "index":{
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "refresh_interval": "30s",
      "analysis": {
        "char_filter": {
            "&_to_and": {
                "type":     "mapping",
                "mappings": ["&=> and "]
            }
        },
        "filter": {
            "my_stopwords": {
                "type":       "stop",
                "stopwords": [ "the", "a" ]
            }
        },
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "char_filter":  [ "html_strip", "&_to_and" ],
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords" ]
            }
        }
      }
    }
  },
  "mappings": {
    "my_type" : {
      "dynamic" : "true",
      "_source" : { "enabled" : false},
      "properties" : {
          "tweet" : {
              "type" :    "text",
              "analyzer": "english",
              "copy_to" : "my_all"
          },
          "date" : {
              "type" :   "date",
              "index": "false"
          },
          "name" : {
              "type": "object",
              "properties": {
                "first_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                },
                "last_name": {
                   "type" :   "text",
                   "copy_to" : "my_all"
                }
              }
        },
        "user_id" : {
           "type" :   "long"
        },
        "description" : {
           "type" : "text",
           "analyzer": "my_analyzer",
           "index":   "true",
           "fields": {
             "keyword": {
               "type" : "keyword",
               "ignore_above": 256
             }
           }
        },
        "my_all": {
           "type" : "text"
        }
      }
    }
  }
}'

创建第一个以日期命名的滚动索引(dys-201909 对应格式 <{now/M{YYYY.MM}}>)

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/%3Cdys-%7Bnow%2FM%7BYYYY.MM%7D%7D%3E?pretty' -d '
{
  "aliases":{
      "dys_latest":{}
  }
}'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "dys-2019.09"
}

插入数据

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/dys_latest/my_type/1?pretty' -d '
{
  "user_id": 1,
  "name": {
    "first_name": "xx",
    "last_name" : "li"
  },
  "date": "2019-10-01",
  "tweet": "this is my first article"
}'
curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/dys_latest/my_type/2?pretty' -d '
{
  "user_id": 2,
  "name": {
    "first_name": "san",
    "last_name" : "zhang"
  },
  "date": "2019-05-01",
  "tweet": "this is my second article"
}'

rollover api 尝试创建滚动索引

curl -i -H "Content-Type: application/json" -XPOST 'http://192.168.12.150:9200/dys_latest/_rollover?pretty' -d '
{
  "conditions":{
      "max_age": "7d",
      "max_docs": 2,
      "max_size": "5g"
  }
}'

发现日期滚动索引报错，报索引模板dys*与<dys{now/M{YYYY.MM}}>无法匹配，这个问题网络上暂时没有对应文章解决，这个问题可能是版本问题，需要更新es打7.0以上（猜测-因为我看到其他人是可以这么使用的，如果谁解决了，麻烦告知我方法，谢谢！）

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "index name [<dys{now/M{YYYY.MM}}>] does not match pattern '^.*-\\d+$'"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "index name [<dys{now/M{YYYY.MM}}>] does not match pattern '^.*-\\d+$'"
  },
  "status" : 400
}

七、同一用户数据存放同一分片（自定义路由）

所有的文档API（get、index、delete、bulk、update、mget）都接收一个routing参数，它用来自定义文档到分片的映射。自定义路由值可以确保所有相关文档——例如属于同一个人的文档——被保存在同一分片上。

curl -H "Content-Type: application/json"  -i  -XPUT 'http://192.168.12.150:9200/magacorp/employee/2?routing=user1&pretty' -d '
{
    "first_name": "Jane",
    "last_name"    : "Smith",
    "age"        : 32,
    "about"        : "I like to collect rock albums",
    "interests"    : ["music"]
}'

八、常用es操作运维指令

1、查看集群节点版本信息

curl -i  -XGET 'http://192.168.12.150:9200/?pretty'
{
  "name" : "es01",
  "cluster_name" : "dys",
  "cluster_uuid" : "Aqo923bwRO6MPYl71KgYXg",
  "version" : {
    "number" : "6.5.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "8c58350",
    "build_date" : "2018-11-16T02:22:42.182257Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

2、查看集群健康状态

查看节点健康状态

curl -i  -XGET 'http://192.168.12.150:9200/_cat/health?pretty'
HTTP/1.1 200 OK
content-type: text/plain; charset=UTF-8
content-length: 57

1567732708 01:18:28 dys green 3 3 20 10 0 0 0 0 - 100.0%

查看集群健康状态

curl -i  -XGET 'http://192.168.12.150:9200/_cluster/health?pretty'
{
  "cluster_name" : "dys",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 10,
  "active_shards" : 20,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

green：所有主要分片和复制分片都可用

yellow：所有主要分片可用，但不是所有复制分片都可用

red：不是所有的主要分片都可用

3、查看集群节点列表

curl -i  -XGET 'http://192.168.12.150:9200/_cat/nodes?pretty'
HTTP/1.1 200 OK
content-type: text/plain; charset=UTF-8
content-length: 146

192.168.12.152 14 93 0 0.00 0.03 0.05 mdi - es03
192.168.12.151 11 92 0 0.00 0.01 0.05 mdi - es2
192.168.12.150  9 96 0 0.01 0.02 0.05 mdi * es01

4、关闭终止ElasticSearch节点

curl -i  -XPOST 'http://192.168.12.150:9200/_shutdown'

5、查看集群所有索引信息

curl -i  -XGET 'http://192.168.12.150:9200/_cat/indices?pretty'
HTTP/1.1 200 OK
content-type: text/plain; charset=UTF-8
content-length: 130

green open dys      wlb5Ex2KR1CCp0fHdj65yg 5 1 3 0   40kb   20kb
green open magacorp GmN8h4zfSHqxaPtqEX21dg 5 1 3 0 34.9kb 17.4kb

5.1、查看索引设置信息

curl -i  -XGET 'http://192.168.12.150:9200/website/_settings?pretty'

6、查看所有索引的所有mapping（所有类型mapping）

curl -i  -XGET 'http://192.168.12.150:9200/_mapping?pretty'
{
  "dys" : {
    "mappings" : {
      "user" : {
        "properties" : {
          "address" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "cellphone" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "firstName" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "id" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "lastName" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "registerDate" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  },
  "magacorp" : {
    "mappings" : {
      "employee" : {
        "properties" : {
          "about" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "age" : {
            "type" : "long"
          },
          "first_name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "interests" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "last_name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

7、查看指定类型的mapping

curl -i  -XGET 'http://192.168.12.150:9200/索引/类型/_mapping?pretty'

8、只获取数据总条数，不返回具体数据

curl  -i  -XGET 'http://192.168.12.150:9200/_count?pretty'

9、删除索引

curl -i  -XDELETE 'http://192.168.12.150:9200/索引名?pretty'

10、短语搜索两种方式

（1）设计type的field的映射使用keyword

curl -i -H "Content-Type: application/json" -XPUT 'http://192.168.12.150:9200/magacorp?pretty' -d '
{
  "mappings": {
     "employee": {
       "properties": {
          "about": {
            "type": "keyword",
            "index": "true"
          }
      }
    }
  }
}'

（2）查询的时候指定使用短语查询

curl -H "Content-Type: application/json"  -i  -XGET 'http://192.168.12.150:9200/magacorp/employee/_search?pretty' -d '
{
  "query" : {
      "match_phrase" : {
        "about" : "rock climbling"
      }
    }
}'

11、高亮搜索字段

GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}

12、聚合实现SQL的group by功能

GET /megacorp/employee/_search
{
    "aggs": {
        "all_interests": {
            "terms": { "field": "interests" }
        }
    }
}

分级汇总


GET /megacorp/employee/_search
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                 }
            }
        }
    }
}

13、put操作可以为创建和更新，使put强制为创建

curl -H "Content-Type: application/json"  -i  -XPUT 'http://192.168.12.150:9200/website/blog/123?pretty&op_type=create' -d '
{
  "title": "My fouth blog entry",
  "text":  "I am starting to get the hang of this...",
  "date":  "2018/01/02"
}'

curl -H "Content-Type: application/json"  -i  -XPUT 'http://192.168.12.150:9200/website/blog/123/_create?pretty' -d '
{
  "title": "My fouth blog entry",
  "text":  "I am starting to get the hang of this...",
  "date":  "2018/01/02"
}'

14、文档强制为局部更新（字段不在则增加，在则更新）

curl -H "Content-Type: application/json"  -i  -XPOST 'http://192.168.12.150:9200/website/blog/124/_update?pretty' -d '
{
  "title": "Hello world!",
  "text":  "Hello everybody~",
  "date":  "2018/02/02"
}'

15、使用_source字段控制检索显示文档一部分数据

curl -i -XGET 'http://192.168.12.150:9200/website/blog/_search?_source=title,text&pretty'
curl -i  -H "Content-Type: application/json" -XGET 'http://192.168.12.150:9200/website/blog/_search?pretty' -d '
{
  "query":{
     "match_all":{}
  },
  "_source":["title", "text"]
}'

16、检查文档是否存在

curl -i -XHEAD http://192.168.12.150:9200/website/blog/123

存在返回200，不存在返回404

17、POST方式增加自动生成id

curl -H "Content-Type: application/json"  -i  -XPOST 'http://192.168.12.150:9200/website/blog?pretty' -d '
{
  "title": "test document",
  "text":  "this is my test document from lixx",
  "date":  "2019/01/01"
}'

18、乐观锁-版本控制，指定版本更新

添加记录

curl -H "Content-Type: application/json"  -i  -XPUT 'http://192.168.12.150:9200/website/blog/1/_create?pretty' -d '
{
  "title": "My SIX blog entry",
  "text":  "THIS is my six article",
  "date":  "2018/11/06"
}'

更新记录，指定版本必须是1的情况才更新（其他人更新后版本发生变化）

curl -H "Content-Type: application/json"  -i  -XPUT 'http://192.168.12.150:9200/website/blog/1?version=1&pretty' -d '
{
  "title": "My SIX blog entry--change",
  "text":  "THIS is my six article--change",
  "date":  "2018/11/06"
}'

使用外部版本控制-只检查内部版本是否小于给定版本（5>内部1版本）

curl -H "Content-Type: application/json"  -i  -XPUT 'http://192.168.12.150:9200/website/blog/1?pretty&version_type=external&version=5' -d '
{
  "title": "Hello world 3!",
  "text":  "Hello everybody~ 3",
  "date":  "2018/02/02"
}'

19、检索多个文档

curl -H "Content-Type: application/json"  -i  -XPOST 'http://192.168.12.150:9200/website/blog/_mget?pretty' -d '
{
  "docs": [
  {
     "_index": "website",
     "_type": "blog",
     "_id": 123
  },
  {
     "_index": "website",
     "_type": "blog",
     "_id": 124,
     "_source": ["title", "text"]
  }
  ]
}'

curl -H "Content-Type: application/json"  -i  -XPOST 'http://192.168.12.150:9200/website/blog/_mget?pretty' -d '
{
  "ids": ["1","2"]
}'

20、ElasticSearch _update、_search、_create、_mget等 api接收参数列表

routing、replication、consistency、timout、retry_on_conflict、version、from、size、preference
、routing、search_type=count、

21、更新字段，添加版本冲突重试次数

curl -H "Content-Type: application/json"  -i  -XPOST 'http://192.168.12.150:9200/website/blog/123/_update?retry_on_conflict=3&pretty' -d '
{
  "doc": {
     "title": "Hello world!-- change",
     "text":  "Hello everybody~--change",
     "date":  "2018/02/02",
     "news":  "this is add text"
  }
}'

22、删除所有索引（注意不要轻易使用导致误删）

DELETE /_all
DELETE /dys*

23、修改索引的副本信息

PUT /my_temp_index
{
"settings": {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
PUT /my_temp_index/_settings
{
"number_of_replicas": 1
}

24、修改类型，保存出原数据或不使用all字段

不存储_source字段--部分更新请求不起作用

PUT /my_index
{
"mappings": {
"my_type": {
"_source": {
"enabled": false
}
}
}
}

不使用all字段(es6.0后不支持操作,使用自定义字段copy_to)

PUT /my_index/_mapping/my_type
{
    "my_type": {
    "_all": { "enabled": false }
}
}

25、_all元字段默认是禁用的，如果需要，可以自行开启

PUT myindex
{
  "mappings": {
    "mytype": {
      "_all": {"enabled": true},
      "properties": {
        "title": {
          "type": "text"
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

设置all字段的分析器

PUT /my_index/my_type/_mapping
{
    "my_type": {
    "_all": { "analyzer": "whitespace" }
}
}

26、自定义动态模板

PUT /my_index
{
    "mappings": {
        "my_type": {
            "dynamic_templates": [
                { 
                  "es": {
                        "match": "*_es", <1>
                        "match_mapping_type": "string",
                        "mapping": {
                            "type": "string",
                            "analyzer": "spanish"
                    }
                  }
                },
                { 
                    "en": {
                        "match": "*", <2>
                        "match_mapping_type": "string",
                        "mapping": {
                            "type": "string",
                            "analyzer": "english"
                        }
                    }
                }
            ]
        }
    }
}

<1> 匹配字段名以 _es 结尾的字段.

<2> 匹配所有字符串类型字段。

使用 dynamic_templates，你可以完全控制新字段的映射，你设置可以通过字段名或数据类型应用一个完全不同的映射。每个模板都有一个名字用于描述这个模板的用途，一个 mapping 字段用于指明这个映射怎么使用，和至少一个参数（例如 match）来定义这个模板适用于哪个字段。模板按照顺序来检测，第一个匹配的模板会被启用。例如，我们给 string 类型字段定义两个模板：

* es: 字段名以 _es 结尾需要使用 spanish 分析器。
* en: 所有其他字段使用 english 分析器。

27、设置所有类型的共享字段和设置

PUT /my_index
{
    "mappings": {
        "_default_": {
            "_all": { "enabled": false }
        },
        "blog": {
            "_all": { "enabled": true }
        }
    }
}

例如以上可以使用 _default_ 映射对所有类型禁用 _all 字段，而只在 blog 字段上开启它，_default_ 映射也是定义索引级别的动态模板的好地方。