01. elasticsearch certification 练习题

最新推荐文章于 2024-06-08 16:44:58 发布

夜月行者

最新推荐文章于 2024-06-08 16:44:58 发布

阅读量1.3k

点赞数 1

分类专栏： # 认证练习

本文链接：https://blog.csdn.net/u013200380/article/details/110878502

版权

认证练习专栏收录该内容

11 篇文章 3 订阅

订阅专栏

文章目录

个人搜集的es练习题

1. node setting

集群的cluster name设置，集群的角色设置，集群的attr设置
今天又看了视频才发现还是有很多地方没有注意到位啊

## cluster,node基础设置
cluster.name: log-dev
node.name: node-2

## node角色配置相关的有4个
node.master: true
node.data: true
node.ingest: true
node.ml: false
## master一般不允许远程进行连接，非master节点可以不配置
cluster.remote.connect: false

# 单机配置snapshot的仓库地址
path.repo: ["/home/deploy/search/log-manager/elasticsearch-7.2.0/repository01"]

# 这个可以直接配置成 _site_ 就代表会绑定本机的内网地址，考试中应该也不用修改
network.host: 19.76.3.145

# 默认值就是9200，考试中好像没有设置
http.port: 12200

# 考试中好像没有设置
transport.port: 12300

# seed_hosts也可以直接设置ip,那么默认会加上transport.port
discovery.seed_hosts: ["19.76.0.98:12300", "19.76.3.145:12300","19.76.0.129:12300"]

# 这个是master的node name构成
cluster.initial_master_nodes: ["node-1", "node-2","node-3"]
bootstrap.system_call_filter: false


## data log 存储地址配置，考试中一般不需要特别关注
path.data: /home/deploy/search/log-manager/elasticsearch-7.2.0/data
path.logs: /home/deploy/search/log-manager/elasticsearch-7.2.0/logs

# reindex 从其他集群的时候这里需要配置其他集群的地址
reindex.remote.whitelist: "19.76.0.27:14200,19.76.0.98:14200, 19.76.3.145:14200,  19.76.0.129:14200"

# node attr的设置
node.attr.size: small
node.attr.rack: rack01
node.attr.disk: big
node.attr.machine: m01

2. parent/child 文档

1. nested相关

POST phone/_doc/1
{
  "brand": "samsung",
  "model": "AS1",
  "features": [
    {
      "type": "os",
      "value": "android"
    },
    {
      "type": "memory",
      "value": "100`"
    },
    {
      "type": "capacity",
      "value": "128"
    }
  ]
}
POST phone/_doc/2
{
  "brand": "apple",
  "model": "AS2",
  "features": [
    {
      "type": "os",
      "value": "apple"
    },
    {
      "type": "memory",
      "value": "32"
    },
    {
      "type": "capacity",
      "value": "100"
    }
  ]
}

使用类似下面的查询，结果是只能查出来一条数据

GET phone/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "features.type": "memory"
          }
        },
        {
          "match": {
            "features.value": "100"
          }
        }
      ]
    }
  }
}

2. join类型设置

{
  "title":"elastic",
  "content":"ELK is a great tool"
}

{"comments":"good blogs"}

上面的两个doc，一个是article，一个是该article的comment，把这两个存入同一个index当中


PUT join_test03
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "content": {
        "type": "text"
      },
      "comments": {
        "type": "text"
      },
      "relation": {
        "type": "join",
        "relations": {  # 注意这个地方是固定的，别忘了
          "article": "comment"
        }
      }
    }
  }
}


PUT join_test03/_doc/1
{
  "title": "elastic",
  "content": "ELK is a great tool",
  "relation": {           #这个字段使用嵌套结构
    "name": "article"
  }
}


PUT join_test03/_doc/2?routing=1
{
  "comments":"good blogs",
  "relation":{
    "name":"comment",
    "parent":1   # 直接是parent
  }
}

测试


GET join_test03/_search
{
  "query": {
    "has_child": {
      "type": "comment",
      "query": {
        "match": {
          "comments": "good"
        }
      }
    }
  }
}

3. query查询

1. 简单高亮

PUT query_highlight/_doc/1
{
  "title":"ther beautifull door is yours",
  "body":"i want to be a better man to left the door "
}
PUT query_highlight/_doc/2
{
  "title":"do you like dog?",
  "body":"the dog is a good friend more than a pet "
}

在查询title字段中存在door的doc，并进行高亮

GET query_highlight/_search
{
  "query": {
    "match": {
      "title": "door"
    }
  },
  "highlight": {
    "fields": {
      "title": {"pre_tags": ["<em>"],"post_tags": ["</em>"]},
      "body": {}  #这里的查询没有效果
    }
  }
}

感觉必须是指定字段才行了

2. 模糊查询

如果要进行编辑距离，即使是近似door的也要能够查出来,下面的是编辑距离为2的查询结果

GET query_highlight/_search
{
  "query": {
    "match": {
      "title": {
        "query": "door",
        "fuzziness": "2"
      }
    }
  },
  "highlight": {
    "fields": {
      "title": {
        "pre_tags": [
          "<em>"
        ],
        "post_tags": [
          "</em>"
        ]
      },
      "body": {}
    }
  }
}

3. multi_match查询


PUT multi_match/_doc/1
{
  "title":"dog is friend",
  "body":"we all should love and protect dogs,they are friend",
  "detail":"do you really believe it is good thing"
}



PUT multi_match/_doc/2?refresh
{
  "title":" cat  is friend",
  "body":"cat is my friend",
  "detail":"do you really believe dog and cat is good thing"
}

在title,body,detail中查找dog,而且三个字段的boost依次为1，2，3

GET multi_match/_search
{
  "query": {
    "multi_match" : {
      "query":      "dog",
      "type":       "most_fields",
      "fields":     [ "title", "body^2", "detail^3" ]
    }
  }
}

4. 打分查询

索引 movie-1，保存的电影信息，title是题目，tags是电影的标签。

在title中包含“my”或者“me”。

如果在tags中包含"romatic movies"，该条算分提高，如果不包含则算分

POST movie-1/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "title": ["my","me"]
          }
        }
      ],
      "should": {
        "match": {
          "tags": {
            "query": "romatic movies",
            "boost": 2
          }
        }
      }
    }
  }
}

5. alias设置查询

为task23设定一个index alias名字为alias2，默认查询只返回评分大于3的电影。
要注意奥，alias是可以设置的时候同时指定过滤条件的。


POST /_aliases
{
  "actions":[
      {
        "add":{
          "index":"task23",
          "alias": "alias2",
          "filter":{
            "range": {
              "score": {
                "gt": 3
              }
            }
          }
        }
      }
    ]
}

6. bool 查询

写一个查询，要求某个关键字"new york"在task25这个索引中，4个字段(“overview”/“title”/“tags”/“tagline”)中至少包含两个以上

POST task25/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "overview": "new york"
          }
        },
        {
          "match": {
            "title": "new york"
          }
        },
        {
          "match": {
            "tags": "new york"
          }
        },
        {
          "match": {
            "tagline": "new york"
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

除了这样写好像真的没有更好的办法

7. 多个索引同一个alias,只有一个index可以写入

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "hamlet-1",
        "alias": "hamlet",
        "is_write_index": true
      }
    },
    {
      "add": {
        "index": "hamlet-2",
        "alias": "hamlet"
      }
    }
  ]
}

8. scroll query

earth_quack 索引中有221条数据，按照每个batch 100条的方式进行遍历



GET earth_quack/_search?scroll=1m&size=100
{
  "query": {
    "range": {
      "Gap": {
        "gte": 10
      }
    }
  }
}

# 第二次的时候啥都不用带了，直接继续使用scroll_id查询即可。
GET _search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAkWSmxIWXRPbmFRSmloeWNTTUVXM0xtQQ=="
}

4. 同义搜索

自定义分词器尽量增加一个lowercase filter是一个好习惯的

1. 题目一 we-do 等同于wedo

要求


PUT synonym_test01/_doc/1
{
  "title":"we-do work",
  "des":"we-do like do work",

}


PUT synonym_test01/_doc/2
{
  "title":"we-do work for a long time , you do not need do it ",
  "des":"we-do like do work"
}


PUT synonym_test01/_doc/3
{
  "title":"wedo work  it ",
  "des":"wedo like do work we do "
}

GET synonym_test01/_search
{
  "query": {
    "match": {
      "title": "wedo"
    }
  }
}

只能查出来id为3的信息，要求使用x-x查询和xx查询的结果是一样的也就是查询 we-do和wedo结果一样


PUT synonym_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "synonym":{
          "type":"custom",
          "tokenizer":"standard",
          "char_filter":"map_filter"
        }
      },
      "char_filter": {
        "map_filter":{
          "type":"mapping",
          "mappings":["- =>"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "synonym"
      }
    }
  }
}


POST _reindex
{
  "source": {
    "index": "synonym_test01"
  },
  "dest": {"index": "synonym_test"}
}

GET synonym_test/_search
{
  "query": {
    "match": {
      "title": "wedo"
    }
  }
}

2. 题目二，dingding搜索

数据如下


PUT dingding_test/_bulk
 {"index":{"_id":1}}
 {"title":"oa is very good"}
 {"index":{"_id":2}}
 {"title":"oA is very good"}
 {"index":{"_id":3}}
 {"title":"OA is very good"}
 {"index":{"_id":4}}
 {"title":"dingding is very good"}
 {"index":{"_id":5}}
 {"title":"dingding is ali software"}
 {"index":{"_id":6}}
 {"title":"0A is very good"}

要求查询 oa oA OA dingding 0A 这几个出来的结果是一样的，都能够命中各个文档。

DELETE dingding_test
PUT dingding_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "synonym"
          ]
        }
      },
      "filter": {
        "synonym": {
          "type": "synonym",
          "synonyms": ["oa,0a,dingding"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}





 
 GET dingding_test/_search
 {
   "query": {
     "match": {
       "title": "oA"
     }
   }
 }

3. 题目三 dog & cat搜索

有一个文档，内容类似dog & cat，要求索引这条文档，并且使用match_phrase query，查询dog & cat或者dog and cat都能match

数据


PUT dog_and_cat/_bulk
{"index":{ "_id":0}}
{"title" : "dog and cat  are my familly"}
{"index":{ "_id":1}}
{"title" : "do you love dog & cat"}
{"index":{ "_id":2}}
{"title" : "you will finally find  dog  cat"}

这一题要求的是用match_phrase查询，而且隐含的一个知识点是&符号会被standard tokenizer去掉，注意不是在filter中去掉的，是在tokenizer中去掉的，所以必须在tokenizer之前对数据进行处理，或者是换tokenizer处理。


PUT dog_and_cat02
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer":{
          "type":"custom",
          "char_filter":["map_filter"],
          "tokenizer":"standard"
        }
      },
      "char_filter": {
        "map_filter":{
        "type":"mapping",
        "mappings":["& => and"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":{
        "type":"text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

reindex and search


POST _reindex
{
  "source": {"index": "dog_and_cat"},
  "dest": {"index": "dog_and_cat02"}
}

GET   dog_and_cat02/_search
{
  "query": {
    "match_phrase": {
      "title": "dog & cat"
    }
  }
}

解法二使用同义词，但是需要把tokenizer换成white space这样的话才能保留&符号，进而使用同义词功能
参考 https://elasticsearch.cn/article/6133

5. multi_fields

1. 题目一：多字段不同analyzer

PUT multi_fields/_doc/1
{
  "title":"manager",
  "des":"this is the man who is more powerfull "
}
PUT multi_fields/_doc/2
{
  "title":"employee",
  "des":"this is the man who really do the job "
}

新建一个索引，为title设置多个字段，字段名为space_f使用whitespace analyzer,然后reindex当前索引的数据到新的索引当中

PUT multi_fields02
{
  "mappings": {
    "properties": {
      "des": {
        "type": "text",
        "fields": {
          "space_f": {
            "type": "text",
            "analyzer": "whitespace"
          }
        }
      },
      "title": {
        "type": "text",
        "fields": {
          "space_f": {
            "type": "text",
            "analyzer": "whitespace"
          }
        }
      }
    }
  }
}

POST _reindex
{
  "source": {"index": "multi_fields"},
  "dest": {"index": "multi_fields02"}
}

2. 题目二：多字段不同analyzer

给出一个有数据的index multi_fields03，设计新的索引multi_fields04的mapings，然后把数据转移到multi_fields04，其中xxx（字段名忘了）字段可以转过去后用standard作为分词器，并且新建两个新字段一个字段名xxx.english,以english分词，另一个字段名为xxx.stop,分词器为stop,其他的field都取和原来的index类型一致。

数据样例


PUT multi_fields03/_doc/1
{
  "content":"i want to be better",
  "name":"chencc",
  "age":180
}


PUT multi_fields03/_doc/2
{
  "content":"she want a happy life",
  "name":"zhaolu",
  "age":18
}


PUT multi_fields03/_doc/3
{
  "content":"best wish for you team",
  "name":"wangj",
  "age":28
}

创建mapping


PUT multi_fields04
{
  "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "content" : {
          "type" : "text",
          "analyzer": "standard",
          "fields" : {
            "english":{
              "type":"text",
              "analyzer":"english"
            },
            "stop":{
              "type":"text",
              "analyzer":"stop"
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
}



POST _reindex
{
  "source": {"index": "multi_fields03"},
  "dest": {"index": "multi_fields04"}
}

6. dynamic_template

以key_开头的字段都是keyword类型
string 类型的以key_开头的字段都是keyword类型

PUT _template/dynamic_template
{
  "index_patterns": [
    "dynamic*"
  ],
  "settings": {
    "number_of_shards": 3
  },
  "mappings": {
    "dynamic_templates": [
      {
        "key_word": {
          "match": "key_*",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }
}

PUT dynamic01/_doc/1
{
  "title":"this doc is for dynamic use",
  "key_want":"go back",
  "key_like":123
}

GET dynamic01
对应的两个key开头的字段都是keyword类型

假如增加一个对初始类型识别的限制则可以限制只把初始类型为string的并且以key_开头的设置为keyword


PUT _template/dynamic_template02
{
  "index_patterns": [
    "02dyn*"
  ],
  "settings": {
    "number_of_shards": 3
  },
  "mappings": {
    "dynamic_templates": [
      {
        "key_word": {
          "match_mapping_type":"string",
          "match": "key_*",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }
}


GET 02dynamic/_doc/1
{
  "title":"this doc is for dynamic use",
  "key_want":"go back",
  "key_like":123
}

GET 02dynamic
...

   "key_like" : {
          "type" : "long"
        },
        "key_want" : {
          "type" : "keyword"
        },

...

7. date_range_and_count.txt

查询国家为China,birth为2016年1-3月的women


PUT people_agg
{
  "mappings": {
    "properties": {
      "birth": {
        "type": "date",
        "format": "yyyy/MM/dd HH:mm:ss.SS"
      },
      "country": {
        "type": "keyword"
      },
      "sex": {
        "type": "keyword"
      },
      "des": {
        "type": "text"
      }
    }
  }
}

PUT people_agg/_doc/1
{
  "birth":"2016/01/04 21:18:48.64",
  "country":"China",
  "sex":"woman",
  "des":"beauty woman"
}


PUT people_agg/_doc/2
{
  "birth":"2016/02/04 21:18:48.64",
  "country":"China",
  "sex":"woman",
  "des":"beauty woman"
}



PUT people_agg/_doc/3
{
  "birth":"2016/01/04 21:18:48.64",
  "country":"China",
  "sex":"man",
  "des":"beauty man"
}



PUT people_agg/_doc/4
{
  "birth":"2016/01/04 21:18:48.64",
  "country":"Japan",
  "sex":"woman",
  "des":"beauty woman"
}


PUT people_agg/_doc/5
{
  "birth":"2016/03/04 21:18:48.64",
  "country":"China",
  "sex":"woman",
  "des":"beauty woman"
}

GET people_agg/_count
{
  "query": {
    "bool": {
      "must": [
        {"range": {
          "birth": {
            "gte": "01/2016",
            "lte": "04/2016",
            "format": "MM/yyyy||yyyy"
          }
        }},
        {"term": {
          "sex": {
            "value": "woman"
          }
        }},
        {
          "term": {
            "country": {
              "value": "China"
            }
          }
        }
      ]
    }
  }
}

需要注意的是这个地方的date格式的range查询，时间应该没有办法用等于来查询
还有就是不用用agg查询，直接使用count查询就ok了。

8. aggregation search

1. 题目一：地震数据按月最大最小

地震数据，查找每个月最大深度和最远距离，同时赛选出来深度最大的月份

{
          "DateTime" : "2016/01/04 21:18:48.64",
          "Latitude" : "37.3257",
          "Longitude" : "-122.1043",
          "Depth" : "-0.32",
          "Magnitude" : "1.55",
          "MagType" : "Md",
          "NbStations" : "12",
          "Gap" : "77",
          "Distance" : "1",
          "RMS" : "0.06",
          "Source" : "NC",
          "EventID" : "72573650"
        }


GET earth_quack/_search?size=0
{
  "aggs": {
    "month": {
      "date_histogram": {
        "field": "DateTime",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_dep": {
          "max": {
            "field": "Depth"
          }
        },
        "max_dis":{
          "max": {
            "field": "Distance"
          }
        }
      }
    },
    "max_bucket":{
      "max_bucket": {
        "buckets_path": "month>max_dep"
      }
    }
  }
}

这个是一个sibling 聚合，直接是最外层，parent聚合是在内层

2. 题目二：地震数据聚合结果过滤过滤

在1的基础上过滤dep>0的数据



GET earth_quack/_search?size=0
{
  "aggs": {
    "month": {
      "date_histogram": {
        "field": "DateTime",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_dep": {
          "max": {
            "field": "Depth"
          }
        },
        "max_dis":{
          "max": {
            "field": "Distance"
          }
        },
        "dep_filter":{
          "bucket_selector": {
            "buckets_path": {"m_dep":"max_dep"},
            "script": "params.m_dep>0"
          }
        }
      }
    }
  }
}

3. 题目三:bucket agg之间的嵌套


PUT log_agg
{
"mappings" : {
      "properties" : {
        "name" : {
          "type" : "text"
        },
        "param" : {
          "type" : "keyword"
        },
        "status" : {
          "type" : "long"
        },
        "uri" : {
          "type" : "keyword"
        }
      }
    }
}

数据


PUT log_agg/_doc/1
{
  "uri":"/query",
  "status":200,
  "param":"query dog"
}

PUT log_agg/_doc/2
{
  "uri":"/query",
  "status":200,
  "param":"query cat"
}

PUT log_agg/_doc/3
{
  "uri":"/query",
  "status":400,
  "param":"query bad"
}

PUT log_agg/_doc/4
{
  "uri":"/login",
  "status":200,
  "param":"uid:123"
}

PUT log_agg/_doc/5
{
  "uri":"/login",
  "status":400,
  "param":"uid:123  uid bad"
}

PUT log_agg/_doc/6
{
  "uri":"/login",
  "status":400,
  "param":"uid:123,pass bad"
}

PUT log_agg/_doc/7
{
  "uri":"/login",
  "status":302,
  "param":"uid:123,no user"
}


PUT log_agg/_doc/8
{
  "uri":"/register",
  "status":302,
  "param":"phone:12345"
}

PUT log_agg/_doc/9
{
  "uri":"/register",
  "status":302,
  "param":"query cat"
}

PUT log_agg/_doc/10
{
  "uri":"/register",
  "status":400,
  "param":"server error"
}

要求
查询日志文件中每个status请求排行前三的url。

这个刚开始没有反应过来可以这样做，bucket之间是可以嵌套的，我以为只能bucket嵌套metric呢



GET log_agg/_search?size=0
{
  "aggs": {
    "status_term": {
      "terms": {
        "field": "status",
        "size": 10
      },
      "aggs": {
        "uri_ter": {
          "terms": {
            "field": "uri",
            "size": 3,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    }
  }
}

"aggregations" : {
    "status_term" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 400,
          "doc_count" : 4,
          "uri_ter" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "/login",
                "doc_count" : 2
              },
              {
                "key" : "/query",
                "doc_count" : 1
              },
              {
                "key" : "/register",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : 200,
          "doc_count" : 3,
          "uri_ter" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "/query",
                "doc_count" : 2
              },
              {
                "key" : "/login",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : 302,
          "doc_count" : 3,
          "uri_ter" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "/register",
                "doc_count" : 2
              },
              {
                "key" : "/login",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }

4. 题目四：生产商排名

针对食品添加剂food ingredient这个索引为task15，要求添加剂字段

ingredient 这个name包含 tt，符合这个条件的top 10的供应商 manufacturer。

POST task15/_search
{
  "query": {
    "match": {
      "ingredient": "tt"
    }
  },
  "aggs": {
    "top_10": {
      "terms": {
        "field": "manufacturer"
      },
      "size": 10
    }
  }
}

9. snapshot_and_restore

给集群创建一个数据仓库，创建一个只包括work02_test09的快照信息
elasticsearch.yml文件配置

path.repo: ["/home/deploy/search/log-manager/single_node/repository_global"]

PUT _snapshot/exam_bak
{
  "type": "fs",
  "settings": {
    "location": "exam_back01"
  }
}

POST _snapshot/exam_bak/_verify

PUT _snapshot/exam_bak/snapshot_1
{
  "indices": "work02_test09"
}


验证一下
GET _snapshot/exam_bak/snapshot_1

10. search_template

数据

PUT search_template/_doc/1
{
  "title":"i love pet ,and i want to have a dog",
  "age":8
}

PUT search_template/_doc/2
{
  "title":"i love pet ,and i want to have a cat",
  "age":18
}


PUT search_template/_doc/3
{
  "title":"i love pet",
  "age":88
}

定义一个search_template,要求title字段中的内容是params1，排序的字段是params2,排序方法是params3,取出来的size是size

GET _search/template
{
  "id": "search_template",
  "params": {
    "params1":“xxxx”，
    "params2":“xxxx”，
    "params2":“asc”，
    "size":10，
  }
}

先大致的写出query

GET search_template/_search
{
  "query": {
    "match": {
      "title": "TEXT"
    }
  },
  "sort": [
    {
      "FIELD": {
        "order": "desc"
      }
    }
  ],
  "size":10

}


填充到template当中
PUT _scripts/template_query
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "title": "{{params1}}"
        }
      },
      "sort": [
        {
          "{{params2}}": {
            "order": "{{params3}}"
          }
        }
      ],
      "size":"{{size}}"
    }
  }
}

验证一下
GET _render/template/template_query
{
  "params": {
    "params1":"pet",
    "params2":"age",
    "params3":"asc",
    "size":10
  }
}


使用其进行搜索


GET search_template/_search/template
{
  "id":"template_query",
  "params": {
    "params1":"pet",
    "params2":"age",
    "params3":"desc",
    "size":10
  }
}

11. update_by_query

1. 题目一：内容join

添加一个新字段，new_field,字段内容是title、content按顺序串联起来，并且title的字段值只有text类型


PUT script_new_field/_doc/1
{
  "title":"save food",
  "content":"we all should save food ,it is important"
}


PUT script_new_field/_doc/2
{
  "title":"protect water",
  "content":"water is precious for all"
}

解法一

POST script_new_field/_update_by_query
{
  "script":{
    "lang":"painless",
    "source":"ctx._source.new_field=ctx._source.title+' '+ctx._source.content"
  }
}
GET script_new_field/_search

解法二


PUT _ingest/pipeline/join_field
{
  "description": "join two field",
  "processors": [
    {"set": {
      "field": "new_ffff",
      "value": "{{title}} {{content}}"
    }}
  ]
}

POST script_new_field/_update_by_query?pipeline=join_field

2. 题目二：query过滤+script设置字段值


PUT city_update/_doc/1
{
  "city":"shanghai",
  "name":"liurui"
}


PUT city_update/_doc/2
{
  "city":"wuhan",
  "name":"liuao"
}

将index中所有city为shanghai的数据修改为beijin

POST city_update/_update_by_query
{
  "query":{
    "match":{
      "city":"shanghai"
    }
  },
  "script":{
    "lang":"painless",
    "source":"ctx._source.city='beijin'"
  }
}

12. reindex pipeline_use

POST _ingest/pipeline/_simulate
{
  "pipeline" : {
    // pipeline definition here
  },
  "docs" : [
    { "_source": {/** first document **/} },
    { "_source": {/** second document **/} },
    // ...
  ]
}

~~只能是直接对pipeline进行渲染，不能对已经存储的pipeline进行渲染。~~
可以的

POST _ingest/pipeline/my-pipeline-id/_simulate
{
  "docs" : [
    { "_source": {/** first document **/} },
    { "_source": {/** second document **/} },
    // ...
  ]
}

1. 题目一字段分割，去除空格，统计长度

转移一个index数据到另一个task2，其中原来index的数据为有个字段的数据为：" xx1 “,” xx2 “,” xx3 ",要求：
转到tesk2的数据为以逗号分割的数组。
去掉每个分割出来的字符串的两边空格。
新增一个数组长度的字段num。

数据准备


PUT pipe_origin/_doc/1
{
  "title":"teacher ,student , mom ",
  "name":"diaom"
}


PUT pipe_origin/_doc/2
{
  "title":"father ,engneer,son",
  "name":"chenq"
}

处理

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "split int arr and count num",
    "processors": [
      {
        "split": {
          "field": "title",
          "target_field": "temp",
          "separator": ","
        }
      },
      {
        "foreach": {
          "field": "temp",
          "processor": {
            "trim": {
              "field": "_ingest._value"
            }
          }
        }
      },
      {
        "script": {
          "lang": "painless",
          "source": "ctx.len=ctx.temp.length;"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "title": "teacher ,student , mom "
      }
    }
  ]
}



POST _reindex
{
  "source": {"index": "pipe_origin"},
  "dest": {
    "index": "pipe_dest",
    "pipeline": "array_deal"
  }
}

2. 题目二，字段拼接，字符串长度

数据准备


PUT pipe_origin/_doc/1
{
  "title":"teacher ,student , mom ",
  "name":"diaom"
}

PUT pipe_origin/_doc/2
{
  "title":"father ,engneer,son",
  "name":"chenq"
}

reindex 到新的dest index 当中，增加一个新的字段join，值是这两个字段的值拼接,并有另一个len统计join的字符数。

答案


POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "split int arr and count num",
    "processors": [
      {
        "set": {
          "field": "join",
          "value": "{{name}} {{title}}"
        }
      },
      {
        "script": {
          "lang": "painless",
          "source": "ctx.len=ctx.join.length()"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "title": "better man",
        "name":"jack"
      }
    }
  ]
}


PUT _ingest/pipeline/you_know
{
  "description": "split int arr and count num",
  "processors": [
    {
      "set": {
        "field": "join",
        "value": "{{name}} {{title}}"
      }
    },
    {
      "script": {
        "lang": "painless",
        "source": "ctx.len=ctx.join.length()"
      }
    }
  ]
}

POST _reindex
{
  "source": {"index": "pipe_origin"},
  "dest": {"index": "join_res","pipeline": "you_know"}
}

GET join_res/_search

13. allocation filter

1. 题目一：冷热架构

部署三个节点的ES节点,有一个属性叫warm_hot
node01为hot，node02和node03是warm节点。
创建两个索引task701,task702,两个索引都是2个shard。一个shard都存在hot一个都存在warm当中。

# node设置
node1:  node.attr.warm_hot: hot
node2: node.attr.warm_hot: warm
node3: node.attr.warm_hot: warm

索引设置
PUT task701
{
  "settings": {
    "index.routing.allocation.include.warm_hot":"warm",
    "number_of_replicas": 0,
    "number_of_shards": 3
  }
}

PUT task702
{
  "settings": {
    "index.routing.allocation.include.warm_hot":"hot",
    "number_of_replicas": 0,
    "number_of_shards": 3
  }
}

2. 题目二：机架感知

三个节点有一个属性rack
node01,node02为rack01, node03为rack02

创建一个索引task703,2个shard,1个replica,
让task703的所有shard能够实现在rack01,rack02上的互备份。

node01:  node.attr.rack: rack01
node02:  node.attr.rack: rack01
node03:  node.attr.rack: rack02


PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "rack",
    "cluster.routing.allocation.awareness.force.rack.values": "rack01,rack02"
  }
}

PUT task703/
{
  "settings": { 
    "number_of_replicas": 1,
    "number_of_shards": 2
  }
}

GET _cat/shards/task703

3. 题目三：集群去除node, index去除node

index books1和books2都是3个主分片，1个副本分片。
要求books1只能分配在node-1上。
要求books2所有分片分配在node-2，node-3上。

不要上来就想着自定义attr,node name本身就是内部自定义的，多好用


PUT books1
{
  "settings": {
    "index.routing.allocation.include.name":"node-1",
    "number_of_shards": 3,
    "number_of_replicas": 0
  }
}
GET _cat/shards/books1


PUT books2
{
  "settings": {
    "index.routing.allocation.include.name":"node-2,node-3",
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}
GET _cat/shards/books2

要从集群中摘除node3,先把数据给自动迁移走

PUT _cluster/settings
{
  "transient" : {
    "cluster.routing.allocation.exclude.name" : "node-1"
  }
}

14. cross cluster search

在cluster1中写入如下数据



PUT hamlet/_bulk
{"index":{"_id":0}}
{"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_id":1}}
{"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay answer me: stand, and unfold yourself."}

在cluster2中写入如下数据



PUT hamlet02/_bulk
{"index":{"_id":0}}
{"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_id":1}}
{"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay answer me: stand, and unfold yourself."}

在cluster1中同时搜索两个集群中speaker为FRANCISCO 的doc

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "10.76.3.145:16300"
          ],
          "transport.ping_schedule": "30s"
        }
      }
    }
  }
}

GET hamlet,cluster_one:hamlet02/_search
{
  "query": {
    "match": {
      "speaker": "FRANCISCO"
    }
  }
}