文章目录
个人搜集的es练习题
1. node setting
集群的cluster name设置, 集群的角色设置,集群的attr设置
今天又看了视频才发现还是有很多地方没有注意到位啊
## cluster,node基础设置
cluster.name: log-dev
node.name: node-2
## node角色配置相关的有4个
node.master: true
node.data: true
node.ingest: true
node.ml: false
## master一般不允许远程进行连接,非master节点可以不配置
cluster.remote.connect: false
# 单机配置snapshot的仓库地址
path.repo: ["/home/deploy/search/log-manager/elasticsearch-7.2.0/repository01"]
# 这个可以直接配置成 _site_ 就代表会绑定本机的内网地址,考试中应该也不用修改
network.host: 19.76.3.145
# 默认值就是9200,考试中好像没有设置
http.port: 12200
# 考试中好像没有设置
transport.port: 12300
# seed_hosts也可以直接设置ip,那么默认会加上transport.port
discovery.seed_hosts: ["19.76.0.98:12300", "19.76.3.145:12300","19.76.0.129:12300"]
# 这个是master的node name构成
cluster.initial_master_nodes: ["node-1", "node-2","node-3"]
bootstrap.system_call_filter: false
## data log 存储地址配置,考试中一般不需要特别关注
path.data: /home/deploy/search/log-manager/elasticsearch-7.2.0/data
path.logs: /home/deploy/search/log-manager/elasticsearch-7.2.0/logs
# reindex 从其他集群的时候这里需要配置其他集群的地址
reindex.remote.whitelist: "19.76.0.27:14200,19.76.0.98:14200, 19.76.3.145:14200, 19.76.0.129:14200"
# node attr的设置
node.attr.size: small
node.attr.rack: rack01
node.attr.disk: big
node.attr.machine: m01
2. parent/child 文档
1. nested相关
POST phone/_doc/1
{
"brand": "samsung",
"model": "AS1",
"features": [
{
"type": "os",
"value": "android"
},
{
"type": "memory",
"value": "100`"
},
{
"type": "capacity",
"value": "128"
}
]
}
POST phone/_doc/2
{
"brand": "apple",
"model": "AS2",
"features": [
{
"type": "os",
"value": "apple"
},
{
"type": "memory",
"value": "32"
},
{
"type": "capacity",
"value": "100"
}
]
}
使用类似下面的查询,结果是只能查出来一条数据
GET phone/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"features.type": "memory"
}
},
{
"match": {
"features.value": "100"
}
}
]
}
}
}
2. join类型设置
{
"title":"elastic",
"content":"ELK is a great tool"
}
{"comments":"good blogs"}
上面的两个doc,一个是article,一个是该article的comment,把这两个存入同一个index当中
PUT join_test03
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"content": {
"type": "text"
},
"comments": {
"type": "text"
},
"relation": {
"type": "join",
"relations": { # 注意这个地方是固定的,别忘了
"article": "comment"
}
}
}
}
}
PUT join_test03/_doc/1
{
"title": "elastic",
"content": "ELK is a great tool",
"relation": { #这个字段使用嵌套结构
"name": "article"
}
}
PUT join_test03/_doc/2?routing=1
{
"comments":"good blogs",
"relation":{
"name":"comment",
"parent":1 # 直接是parent
}
}
测试
GET join_test03/_search
{
"query": {
"has_child": {
"type": "comment",
"query": {
"match": {
"comments": "good"
}
}
}
}
}
3. query查询
1. 简单高亮
PUT query_highlight/_doc/1
{
"title":"ther beautifull door is yours",
"body":"i want to be a better man to left the door "
}
PUT query_highlight/_doc/2
{
"title":"do you like dog?",
"body":"the dog is a good friend more than a pet "
}
在查询title字段中存在door的doc,并进行高亮
GET query_highlight/_search
{
"query": {
"match": {
"title": "door"
}
},
"highlight": {
"fields": {
"title": {"pre_tags": ["<em>"],"post_tags": ["</em>"]},
"body": {} #这里的查询没有效果
}
}
}
感觉必须是指定字段才行了
2. 模糊查询
如果要进行编辑距离,即使是近似door的也要能够查出来,下面的是编辑距离为2的查询结果
GET query_highlight/_search
{
"query": {
"match": {
"title": {
"query": "door",
"fuzziness": "2"
}
}
},
"highlight": {
"fields": {
"title": {
"pre_tags": [
"<em>"
],
"post_tags": [
"</em>"
]
},
"body": {}
}
}
}
3. multi_match查询
PUT multi_match/_doc/1
{
"title":"dog is friend",
"body":"we all should love and protect dogs,they are friend",
"detail":"do you really believe it is good thing"
}
PUT multi_match/_doc/2?refresh
{
"title":" cat is friend",
"body":"cat is my friend",
"detail":"do you really believe dog and cat is good thing"
}
在title,body,detail中查找dog,而且三个字段的boost依次为1,2,3
GET multi_match/_search
{
"query": {
"multi_match" : {
"query": "dog",
"type": "most_fields",
"fields": [ "title", "body^2", "detail^3" ]
}
}
}
4. 打分查询
索引 movie-1,保存的电影信息,title是题目,tags是电影的标签。
在title中包含“my”或者“me”。
如果在tags中包含"romatic movies",该条算分提高,如果不包含则算分
POST movie-1/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"title": ["my","me"]
}
}
],
"should": {
"match": {
"tags": {
"query": "romatic movies",
"boost": 2
}
}
}
}
}
}
5. alias设置查询
为task23设定一个index alias名字为alias2,默认查询只返回评分大于3的电影。
要注意奥,alias是可以设置的时候同时指定过滤条件的。
POST /_aliases
{
"actions":[
{
"add":{
"index":"task23",
"alias": "alias2",
"filter":{
"range": {
"score": {
"gt": 3
}
}
}
}
}
]
}
6. bool 查询
写一个查询,要求某个关键字"new york"在task25这个索引中,4个字段(“overview”/“title”/“tags”/“tagline”)中至少包含两个以上
POST task25/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"overview": "new york"
}
},
{
"match": {
"title": "new york"
}
},
{
"match": {
"tags": "new york"
}
},
{
"match": {
"tagline": "new york"
}
}
],
"minimum_should_match": 2
}
}
}
除了这样写好像真的没有更好的办法
7. 多个索引同一个alias,只有一个index可以写入
POST _aliases
{
"actions": [
{
"add": {
"index": "hamlet-1",
"alias": "hamlet",
"is_write_index": true
}
},
{
"add": {
"index": "hamlet-2",
"alias": "hamlet"
}
}
]
}
8. scroll query
earth_quack 索引中有221条数据,按照每个batch 100条的方式进行遍历
GET earth_quack/_search?scroll=1m&size=100
{
"query": {
"range": {
"Gap": {
"gte": 10
}
}
}
}
# 第二次的时候啥都不用带了,直接继续使用scroll_id查询即可。
GET _search/scroll
{
"scroll": "1m",
"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAkWSmxIWXRPbmFRSmloeWNTTUVXM0xtQQ=="
}
4. 同义搜索
自定义分词器尽量增加一个lowercase filter是一个好习惯的
1. 题目一 we-do 等同于wedo
要求
PUT synonym_test01/_doc/1
{
"title":"we-do work",
"des":"we-do like do work",
}
PUT synonym_test01/_doc/2
{
"title":"we-do work for a long time , you do not need do it ",
"des":"we-do like do work"
}
PUT synonym_test01/_doc/3
{
"title":"wedo work it ",
"des":"wedo like do work we do "
}
GET synonym_test01/_search
{
"query": {
"match": {
"title": "wedo"
}
}
}
只能查出来id为3的信息,要求使用x-x查询和xx查询的结果是一样的也就是查询 we-do
和wedo
结果一样
PUT synonym_test
{
"settings": {
"analysis": {
"analyzer": {
"synonym":{
"type":"custom",
"tokenizer":"standard",
"char_filter":"map_filter"
}
},
"char_filter": {
"map_filter":{
"type":"mapping",
"mappings":["- =>"]
}
}
}
},
"mappings": {
"properties": {
"title":{
"type": "text",
"analyzer": "synonym"
}
}
}
}
POST _reindex
{
"source": {
"index": "synonym_test01"
},
"dest": {"index": "synonym_test"}
}
GET synonym_test/_search
{
"query": {
"match": {
"title": "wedo"
}
}
}
2. 题目二,dingding搜索
数据如下
PUT dingding_test/_bulk
{"index":{"_id":1}}
{"title":"oa is very good"}
{"index":{"_id":2}}
{"title":"oA is very good"}
{"index":{"_id":3}}
{"title":"OA is very good"}
{"index":{"_id":4}}
{"title":"dingding is very good"}
{"index":{"_id":5}}
{"title":"dingding is ali software"}
{"index":{"_id":6}}
{"title":"0A is very good"}
要求查询 oa oA OA dingding 0A 这几个出来的结果是一样的,都能够命中各个文档。
DELETE dingding_test
PUT dingding_test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": ["oa,0a,dingding"]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
GET dingding_test/_search
{
"query": {
"match": {
"title": "oA"
}
}
}
3. 题目三 dog & cat搜索
有一个文档,内容类似dog & cat, 要求索引这条文档,并且使用match_phrase query,查询dog & cat或者dog and cat都能match
数据
PUT dog_and_cat/_bulk
{"index":{ "_id":0}}
{"title" : "dog and cat are my familly"}
{"index":{ "_id":1}}
{"title" : "do you love dog & cat"}
{"index":{ "_id":2}}
{"title" : "you will finally find dog cat"}
这一题要求的是用match_phrase查询,而且隐含的一个知识点是&符号会被standard tokenizer去掉,注意不是在filter中去掉的,是在tokenizer中去掉的,所以必须在tokenizer之前对数据进行处理,或者是换tokenizer处理。
PUT dog_and_cat02
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer":{
"type":"custom",
"char_filter":["map_filter"],
"tokenizer":"standard"
}
},
"char_filter": {
"map_filter":{
"type":"mapping",
"mappings":["& => and"]
}
}
}
},
"mappings": {
"properties": {
"title":{
"type":"text",
"analyzer": "my_analyzer"
}
}
}
}
reindex and search
POST _reindex
{
"source": {"index": "dog_and_cat"},
"dest": {"index": "dog_and_cat02"}
}
GET dog_and_cat02/_search
{
"query": {
"match_phrase": {
"title": "dog & cat"
}
}
}
解法二使用同义词,但是需要把tokenizer换成white space这样的话才能保留&符号,进而使用同义词功能
参考 https://elasticsearch.cn/article/6133
5. multi_fields
1. 题目一: 多字段不同analyzer
PUT multi_fields/_doc/1
{
"title":"manager",
"des":"this is the man who is more powerfull "
}
PUT multi_fields/_doc/2
{
"title":"employee",
"des":"this is the man who really do the job "
}
新建一个索引,为title设置多个字段,字段名为space_f使用whitespace analyzer,然后reindex当前索引的数据到新的索引当中
PUT multi_fields02
{
"mappings": {
"properties": {
"des": {
"type": "text",
"fields": {
"space_f": {
"type": "text",
"analyzer": "whitespace"
}
}
},
"title": {
"type": "text",
"fields": {
"space_f": {
"type": "text",
"analyzer": "whitespace"
}
}
}
}
}
}
POST _reindex
{
"source": {"index": "multi_fields"},
"dest": {"index": "multi_fields02"}
}
2. 题目二: 多字段不同analyzer
给出一个有数据的index multi_fields03,设计新的索引multi_fields04的mapings,然后把数据转移到multi_fields04,其中xxx(字段名忘了)字段可以转过去后用standard作为分词器,并且新建两个新字段一个字段名xxx.english,以english分词,另一个字段名为xxx.stop,分词器为stop,其他的field都取和原来的index类型一致。
数据样例
PUT multi_fields03/_doc/1
{
"content":"i want to be better",
"name":"chencc",
"age":180
}
PUT multi_fields03/_doc/2
{
"content":"she want a happy life",
"name":"zhaolu",
"age":18
}
PUT multi_fields03/_doc/3
{
"content":"best wish for you team",
"name":"wangj",
"age":28
}
创建mapping
PUT multi_fields04
{
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"content" : {
"type" : "text",
"analyzer": "standard",
"fields" : {
"english":{
"type":"text",
"analyzer":"english"
},
"stop":{
"type":"text",
"analyzer":"stop"
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
POST _reindex
{
"source": {"index": "multi_fields03"},
"dest": {"index": "multi_fields04"}
}
6. dynamic_template
以key_开头的字段都是keyword类型
string 类型的以key_开头的字段都是keyword类型
PUT _template/dynamic_template
{
"index_patterns": [
"dynamic*"
],
"settings": {
"number_of_shards": 3
},
"mappings": {
"dynamic_templates": [
{
"key_word": {
"match": "key_*",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
PUT dynamic01/_doc/1
{
"title":"this doc is for dynamic use",
"key_want":"go back",
"key_like":123
}
GET dynamic01
对应的两个key开头的字段都是keyword类型
假如增加一个对初始类型识别的限制则可以限制只把初始类型为string的并且以key_开头的设置为keyword
PUT _template/dynamic_template02
{
"index_patterns": [
"02dyn*"
],
"settings": {
"number_of_shards": 3
},
"mappings": {
"dynamic_templates": [
{
"key_word": {
"match_mapping_type":"string",
"match": "key_*",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
GET 02dynamic/_doc/1
{
"title":"this doc is for dynamic use",
"key_want":"go back",
"key_like":123
}
GET 02dynamic
...
"key_like" : {
"type" : "long"
},
"key_want" : {
"type" : "keyword"
},
...
7. date_range_and_count.txt
查询国家为China,birth为2016年1-3月的women
PUT people_agg
{
"mappings": {
"properties": {
"birth": {
"type": "date",
"format": "yyyy/MM/dd HH:mm:ss.SS"
},
"country": {
"type": "keyword"
},
"sex": {
"type": "keyword"
},
"des": {
"type": "text"
}
}
}
}
PUT people_agg/_doc/1
{
"birth":"2016/01/04 21:18:48.64",
"country":"China",
"sex":"woman",
"des":"beauty woman"
}
PUT people_agg/_doc/2
{
"birth":"2016/02/04 21:18:48.64",
"country":"China",
"sex":"woman",
"des":"beauty woman"
}
PUT people_agg/_doc/3
{
"birth":"2016/01/04 21:18:48.64",
"country":"China",
"sex":"man",
"des":"beauty man"
}
PUT people_agg/_doc/4
{
"birth":"2016/01/04 21:18:48.64",
"country":"Japan",
"sex":"woman",
"des":"beauty woman"
}
PUT people_agg/_doc/5
{
"birth":"2016/03/04 21:18:48.64",
"country":"China",
"sex":"woman",
"des":"beauty woman"
}
GET people_agg/_count
{
"query": {
"bool": {
"must": [
{"range": {
"birth": {
"gte": "01/2016",
"lte": "04/2016",
"format": "MM/yyyy||yyyy"
}
}},
{"term": {
"sex": {
"value": "woman"
}
}},
{
"term": {
"country": {
"value": "China"
}
}
}
]
}
}
}
需要注意的是这个地方的date格式的range查询,时间应该没有办法用等于来查询
还有就是不用用agg查询,直接使用count查询就ok了。
8. aggregation search
1. 题目一:地震数据按月最大最小
地震数据,查找每个月最大深度和最远距离,同时赛选出来深度最大的月份
{
"DateTime" : "2016/01/04 21:18:48.64",
"Latitude" : "37.3257",
"Longitude" : "-122.1043",
"Depth" : "-0.32",
"Magnitude" : "1.55",
"MagType" : "Md",
"NbStations" : "12",
"Gap" : "77",
"Distance" : "1",
"RMS" : "0.06",
"Source" : "NC",
"EventID" : "72573650"
}
GET earth_quack/_search?size=0
{
"aggs": {
"month": {
"date_histogram": {
"field": "DateTime",
"calendar_interval": "month"
},
"aggs": {
"max_dep": {
"max": {
"field": "Depth"
}
},
"max_dis":{
"max": {
"field": "Distance"
}
}
}
},
"max_bucket":{
"max_bucket": {
"buckets_path": "month>max_dep"
}
}
}
}
这个是一个sibling 聚合,直接是最外层,parent聚合是在内层
2. 题目二: 地震数据聚合结果过滤过滤
在1的基础上过滤dep>0的数据
GET earth_quack/_search?size=0
{
"aggs": {
"month": {
"date_histogram": {
"field": "DateTime",
"calendar_interval": "month"
},
"aggs": {
"max_dep": {
"max": {
"field": "Depth"
}
},
"max_dis":{
"max": {
"field": "Distance"
}
},
"dep_filter":{
"bucket_selector": {
"buckets_path": {"m_dep":"max_dep"},
"script": "params.m_dep>0"
}
}
}
}
}
}
3. 题目三:bucket agg之间的嵌套
PUT log_agg
{
"mappings" : {
"properties" : {
"name" : {
"type" : "text"
},
"param" : {
"type" : "keyword"
},
"status" : {
"type" : "long"
},
"uri" : {
"type" : "keyword"
}
}
}
}
数据
PUT log_agg/_doc/1
{
"uri":"/query",
"status":200,
"param":"query dog"
}
PUT log_agg/_doc/2
{
"uri":"/query",
"status":200,
"param":"query cat"
}
PUT log_agg/_doc/3
{
"uri":"/query",
"status":400,
"param":"query bad"
}
PUT log_agg/_doc/4
{
"uri":"/login",
"status":200,
"param":"uid:123"
}
PUT log_agg/_doc/5
{
"uri":"/login",
"status":400,
"param":"uid:123 uid bad"
}
PUT log_agg/_doc/6
{
"uri":"/login",
"status":400,
"param":"uid:123,pass bad"
}
PUT log_agg/_doc/7
{
"uri":"/login",
"status":302,
"param":"uid:123,no user"
}
PUT log_agg/_doc/8
{
"uri":"/register",
"status":302,
"param":"phone:12345"
}
PUT log_agg/_doc/9
{
"uri":"/register",
"status":302,
"param":"query cat"
}
PUT log_agg/_doc/10
{
"uri":"/register",
"status":400,
"param":"server error"
}
要求
查询日志文件中每个status请求排行前三的url。
这个刚开始没有反应过来可以这样做,bucket之间是可以嵌套的,我以为只能bucket嵌套metric呢
GET log_agg/_search?size=0
{
"aggs": {
"status_term": {
"terms": {
"field": "status",
"size": 10
},
"aggs": {
"uri_ter": {
"terms": {
"field": "uri",
"size": 3,
"order": {
"_count": "desc"
}
}
}
}
}
}
}
返回
"aggregations" : {
"status_term" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 400,
"doc_count" : 4,
"uri_ter" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "/login",
"doc_count" : 2
},
{
"key" : "/query",
"doc_count" : 1
},
{
"key" : "/register",
"doc_count" : 1
}
]
}
},
{
"key" : 200,
"doc_count" : 3,
"uri_ter" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "/query",
"doc_count" : 2
},
{
"key" : "/login",
"doc_count" : 1
}
]
}
},
{
"key" : 302,
"doc_count" : 3,
"uri_ter" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "/register",
"doc_count" : 2
},
{
"key" : "/login",
"doc_count" : 1
}
]
}
}
]
}
}
4. 题目四:生产商排名
针对食品添加剂food ingredient这个索引为task15,要求添加剂字段
ingredient 这个name包含 tt,符合这个条件的top 10的供应商 manufacturer。
POST task15/_search
{
"query": {
"match": {
"ingredient": "tt"
}
},
"aggs": {
"top_10": {
"terms": {
"field": "manufacturer"
},
"size": 10
}
}
}
9. snapshot_and_restore
给集群创建一个数据仓库,创建一个只包括work02_test09的快照信息
elasticsearch.yml文件配置
path.repo: ["/home/deploy/search/log-manager/single_node/repository_global"]
PUT _snapshot/exam_bak
{
"type": "fs",
"settings": {
"location": "exam_back01"
}
}
POST _snapshot/exam_bak/_verify
PUT _snapshot/exam_bak/snapshot_1
{
"indices": "work02_test09"
}
验证一下
GET _snapshot/exam_bak/snapshot_1
10. search_template
数据
PUT search_template/_doc/1
{
"title":"i love pet ,and i want to have a dog",
"age":8
}
PUT search_template/_doc/2
{
"title":"i love pet ,and i want to have a cat",
"age":18
}
PUT search_template/_doc/3
{
"title":"i love pet",
"age":88
}
定义一个search_template,要求title字段中的内容是params1,排序的字段是params2,排序方法是params3,取出来的size是size
GET _search/template
{
"id": "search_template",
"params": {
"params1":“xxxx”,
"params2":“xxxx”,
"params2":“asc”,
"size":10,
}
}
先大致的写出query
GET search_template/_search
{
"query": {
"match": {
"title": "TEXT"
}
},
"sort": [
{
"FIELD": {
"order": "desc"
}
}
],
"size":10
}
填充到template当中
PUT _scripts/template_query
{
"script": {
"lang": "mustache",
"source": {
"query": {
"match": {
"title": "{{params1}}"
}
},
"sort": [
{
"{{params2}}": {
"order": "{{params3}}"
}
}
],
"size":"{{size}}"
}
}
}
验证一下
GET _render/template/template_query
{
"params": {
"params1":"pet",
"params2":"age",
"params3":"asc",
"size":10
}
}
使用其进行搜索
GET search_template/_search/template
{
"id":"template_query",
"params": {
"params1":"pet",
"params2":"age",
"params3":"desc",
"size":10
}
}
11. update_by_query
1. 题目一:内容join
添加一个新字段,new_field,字段内容是title、content按顺序串联起来,并且title的字段值只有text类型
PUT script_new_field/_doc/1
{
"title":"save food",
"content":"we all should save food ,it is important"
}
PUT script_new_field/_doc/2
{
"title":"protect water",
"content":"water is precious for all"
}
解法一
POST script_new_field/_update_by_query
{
"script":{
"lang":"painless",
"source":"ctx._source.new_field=ctx._source.title+' '+ctx._source.content"
}
}
GET script_new_field/_search
解法二
PUT _ingest/pipeline/join_field
{
"description": "join two field",
"processors": [
{"set": {
"field": "new_ffff",
"value": "{{title}} {{content}}"
}}
]
}
POST script_new_field/_update_by_query?pipeline=join_field
2. 题目二:query过滤+script设置字段值
PUT city_update/_doc/1
{
"city":"shanghai",
"name":"liurui"
}
PUT city_update/_doc/2
{
"city":"wuhan",
"name":"liuao"
}
将index中所有city为shanghai的数据修改为beijin
POST city_update/_update_by_query
{
"query":{
"match":{
"city":"shanghai"
}
},
"script":{
"lang":"painless",
"source":"ctx._source.city='beijin'"
}
}
12. reindex pipeline_use
POST _ingest/pipeline/_simulate
{
"pipeline" : {
// pipeline definition here
},
"docs" : [
{ "_source": {/** first document **/} },
{ "_source": {/** second document **/} },
// ...
]
}
只能是直接对pipeline进行渲染,不能对已经存储的pipeline进行渲染。
可以的
POST _ingest/pipeline/my-pipeline-id/_simulate
{
"docs" : [
{ "_source": {/** first document **/} },
{ "_source": {/** second document **/} },
// ...
]
}
1. 题目一 字段分割,去除空格,统计长度
转移一个index数据到另一个task2,其中原来index的数据为有个字段的数据为:" xx1 “,” xx2 “,” xx3 ",要求:
转到tesk2的数据为以逗号分割的数组。
去掉每个分割出来的字符串的两边空格。
新增一个数组长度的字段num。
数据准备
PUT pipe_origin/_doc/1
{
"title":"teacher ,student , mom ",
"name":"diaom"
}
PUT pipe_origin/_doc/2
{
"title":"father ,engneer,son",
"name":"chenq"
}
处理
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "split int arr and count num",
"processors": [
{
"split": {
"field": "title",
"target_field": "temp",
"separator": ","
}
},
{
"foreach": {
"field": "temp",
"processor": {
"trim": {
"field": "_ingest._value"
}
}
}
},
{
"script": {
"lang": "painless",
"source": "ctx.len=ctx.temp.length;"
}
}
]
},
"docs": [
{
"_source": {
"title": "teacher ,student , mom "
}
}
]
}
POST _reindex
{
"source": {"index": "pipe_origin"},
"dest": {
"index": "pipe_dest",
"pipeline": "array_deal"
}
}
2. 题目二,字段拼接,字符串长度
数据准备
PUT pipe_origin/_doc/1
{
"title":"teacher ,student , mom ",
"name":"diaom"
}
PUT pipe_origin/_doc/2
{
"title":"father ,engneer,son",
"name":"chenq"
}
reindex 到新的dest index 当中,增加一个新的字段join,值是这两个字段的值拼接,并有另一个len统计join的字符数。
答案
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "split int arr and count num",
"processors": [
{
"set": {
"field": "join",
"value": "{{name}} {{title}}"
}
},
{
"script": {
"lang": "painless",
"source": "ctx.len=ctx.join.length()"
}
}
]
},
"docs": [
{
"_source": {
"title": "better man",
"name":"jack"
}
}
]
}
PUT _ingest/pipeline/you_know
{
"description": "split int arr and count num",
"processors": [
{
"set": {
"field": "join",
"value": "{{name}} {{title}}"
}
},
{
"script": {
"lang": "painless",
"source": "ctx.len=ctx.join.length()"
}
}
]
}
POST _reindex
{
"source": {"index": "pipe_origin"},
"dest": {"index": "join_res","pipeline": "you_know"}
}
GET join_res/_search
13. allocation filter
1. 题目一: 冷热架构
部署三个节点的ES节点,有一个属性叫warm_hot
node01为hot,node02和node03是warm节点。
创建两个索引task701,task702,两个索引都是2个shard。一个shard都存在hot一个都存在warm当中。
# node设置
node1: node.attr.warm_hot: hot
node2: node.attr.warm_hot: warm
node3: node.attr.warm_hot: warm
索引设置
PUT task701
{
"settings": {
"index.routing.allocation.include.warm_hot":"warm",
"number_of_replicas": 0,
"number_of_shards": 3
}
}
PUT task702
{
"settings": {
"index.routing.allocation.include.warm_hot":"hot",
"number_of_replicas": 0,
"number_of_shards": 3
}
}
2. 题目二:机架感知
三个节点有一个属性rack
node01,node02为rack01, node03为rack02
创建一个索引task703,2个shard,1个replica,
让task703的所有shard能够实现在rack01,rack02上的互备份。
node01: node.attr.rack: rack01
node02: node.attr.rack: rack01
node03: node.attr.rack: rack02
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.awareness.attributes": "rack",
"cluster.routing.allocation.awareness.force.rack.values": "rack01,rack02"
}
}
PUT task703/
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 2
}
}
GET _cat/shards/task703
3. 题目三: 集群去除node, index去除node
index books1和books2都是3个主分片,1个副本分片。
要求books1只能分配在node-1上。
要求books2所有分片分配在node-2,node-3上。
不要上来就想着自定义attr,node name本身就是内部自定义的,多好用
PUT books1
{
"settings": {
"index.routing.allocation.include.name":"node-1",
"number_of_shards": 3,
"number_of_replicas": 0
}
}
GET _cat/shards/books1
PUT books2
{
"settings": {
"index.routing.allocation.include.name":"node-2,node-3",
"number_of_shards": 3,
"number_of_replicas": 1
}
}
GET _cat/shards/books2
要从集群中摘除node3,先把数据给自动迁移走
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude.name" : "node-1"
}
}
14. cross cluster search
在cluster1中写入如下数据
PUT hamlet/_bulk
{"index":{"_id":0}}
{"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_id":1}}
{"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay answer me: stand, and unfold yourself."}
在cluster2中写入如下数据
PUT hamlet02/_bulk
{"index":{"_id":0}}
{"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_id":1}}
{"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay answer me: stand, and unfold yourself."}
在cluster1中同时搜索两个集群中speaker为FRANCISCO 的doc
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"cluster_one": {
"seeds": [
"10.76.3.145:16300"
],
"transport.ping_schedule": "30s"
}
}
}
}
}
GET hamlet,cluster_one:hamlet02/_search
{
"query": {
"match": {
"speaker": "FRANCISCO"
}
}
}
15. 截图真题
https://github.com/mingyitianxia/elastic-certified-engineer/blob/master/review-practice/0011_zhenti.md