04.德国博士练习_04_index_data

最新推荐文章于 2021-12-18 14:48:30 发布

夜月行者

最新推荐文章于 2021-12-18 14:48:30 发布

阅读量362

点赞数

分类专栏： # 认证练习

本文链接：https://blog.csdn.net/u013200380/article/details/110881826

版权

认证练习专栏收录该内容

11 篇文章 3 订阅

订阅专栏

文章目录

1. exercise01: update delete by query

# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create, update and delete indices while satisfying a given
# set of requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`


# Create the index `hamlet-raw` with 1 primary shard and 3 replicas

# Add a document to `hamlet-raw`, so that the document (i) has id
# "1", (ii) has default type, (iii) has one field named `line`
# with value "To be, or not to be: that is the question"

# Update the document with id "1" by adding a field named
# `line_number` with value "3.1.64"
# Add a new document to `hamlet-raw`, so that the document (i) has
# the id automatically assigned by Elasticsearch, (ii) has
# default type, (iii) has a field named `text_entry` with value
# "Whether tis nobler in the mind to suffer", (iv) has a field
# named `line_number` with value "3.1.66"
# Update the last document by setting the value of `line_number` to
# "3.1.65"
# In one request, update all documents in `hamlet-raw` by adding a
# new field named `speaker` with value "Hamlet"


# Update the document with id "1" by renaming the field `line` into
# `text_entry`

题解


PUT hamlet-raw
{
  "settings": {
    "number_of_replicas": 3,
    "number_of_shards": 1
  }
}

PUT hamlet-raw/_doc/1
{
  "line":"To be, or not to be: that is the question"
}

POST hamlet-raw/_update/1
{
    "doc" : {
        "line_number" : "3.1.64"
    }
}

GET hamlet-raw/_doc/1

POST hamlet-raw/_doc
{
  "text_entry": "text_entry",
  "line_number": "3.1.66"
}

# 根据返回的id进行操作
POST hamlet-raw/_update/2uDDLHYBznFAtuOD6g0k
{
  "doc":{
    "line_number": "3.1.65"
  }
}

POST hamlet-raw/_update_by_query
{
  "script":{
    "lang":"painless",
    "source":"ctx._source.speaker='Hamlet'"
  }
}

GET hamlet-raw/_search

使用ingest pipeline
PUT _ingest/pipeline/rename_field
{
  "description": "rename field",
  "processors": [
    {"rename": {
      "field": "line",
      "target_field": "text_entry"
    }}
  ]
}
POST hamlet-raw/_update_by_query?pipeline=rename_field
{
  "query": {
    "ids": {
      "values": [
        "1"
      ]
    }
  }
}
GET hamlet-raw/_search

也可以用script来处理

POST hamlet-raw/_update/2
{
  "script":{
    "lang":"painless",
    "source": "ctx._source.text_entry=ctx._source.remove('line')"
  }
}

第二题

# Create the index `hamlet` and add some documents by running the
# following _bulk commandPUT hamlet/_doc/_bulk
{"index":{"_index":"hamlet","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}
{"index":{"_index":"hamlet","_id":4}}
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The
memory be green, and that it us befitted"}
{"index":{"_index":"hamlet","_id":5}}
{"line_number":"1.3.1","speaker":"LAERTES","text_entry":"My
necessaries are embarkd: farewell:"}
{"index":{"_index":"hamlet","_id":6}}
{"line_number":"1.3.4","speaker":"LAERTES","text_entry":"But let me
hear from you."}
{"index":{"_index":"hamlet","_id":7}}
{"line_number":"1.3.5","speaker":"OPHELIA","text_entry":"Do you doubt
that?"}
{"index":{"_index":"hamlet","_id":8}}
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites
shrewdly; it is very cold."}
{"index":{"_index":"hamlet","_id":9}}
{"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a
nipping and an eager air."}
{"index":{"_index":"hamlet","_idd":10}}
{"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour
now?"}
{"index":{"_index":"hamlet","_id":11}}
{"line_number":"1.5.2","speaker":"Ghost","text_entry":"Mark me."}
{"index":{"_index":"hamlet","_id":12}}
{"line_number":"1.5.3","speaker":"HAMLET","text_entry":"I will."}

# Create a script named `set_is_hamlet` and save it into the cluster
# state. The script (i) adds a field named `is_hamlet` to each
# document, (ii) sets the field to "true" if the document has
# `speaker` equals to "HAMLET", (iii) sets the field to "false"
# otherwise
# Update all documents in `hamlet` by running the `set_is_hamlet`
# script


Pretty convenient the “update_by_query” API, don’t you think? Do you also
know how to use its counterpart for deletion?
# Remove from `hamlet` the documents that have either "KING
# CLAUDIUS" or "LAERTES" as the value of `speaker`

这里需要注意的是先存储script，然后再使用的模式，之前很少这样用。


# 先用这个语法整一下
POST hamlet/_update_by_query
{
  "script":{
    "lang":"painless",
    "source":"""
    if(ctx._source.speaker.equals('HAMLET')){
      ctx._source.is_hamlet=true;
    }else{
      ctx._source.is_hamlet=false;
    }
    """
  }
}


把上面的语句存储一下, search template也是可以这里存储
PUT _scripts/set_is_hamlet
{
  "script":{
    "lang":"painless",
    "source":"""
    if(ctx._source.speaker.equals('HAMLET')){
      ctx._source.is_hamlet=true;
    }else{
      ctx._source.is_hamlet=false;
    }
    """
  }
}

使用存储的script
POST hamlet/_update_by_query
{
  "script":{
    "id":"set_is_hamlet"
  }
}

GET hamlet/_search

删除操作


POST hamlet/_delete_by_query
{
  "query": {
   "terms": {
     "speaker.keyword": [
       "KING CLAUDIUS",
       "LAERTES"
     ]
   }
  }
}

2. exercise02: index template

# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create index templates that satisfy a given set of
# requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`

# Create the index template `hamlet_template`, so that the template
# (i) matches any index that starts by "hamlet_" or "hamlet-",
# (ii) allocates one primary shard and no replicas for each 

# matching index
# Create the indices `hamlet2` and `hamlet_test`
# Verify that only `hamlet_test` applies the settings defined in
# `hamlet_template`

template 没有办法进行部分update，update操作和创建操作一样，是直接的全部覆盖。


DELETE hamlet*

DELETE _template/hamlet*

PUT _template/hamlet_template
{
  "index_patterns":["hamlet_*","hamlet-*"],
  "settings":{
    "number_of_shards":1,
    "number_of_replicas":0
  }
}

PUT hamlet2
PUT hamlet_test

GET _cat/shards/hamlet2?v
GET _cat/shards/hamlet_test?v

# Update `hamlet_template` by defining a mapping for the type
# "_doc", so that (i) the type has three fields, named `speaker`,
# `line_number`, and `text_entry`, (ii) `text_entry` uses an
# "english" analyzer
Updates to an index template are not automatically reflected on the matching
indices that already exist. This is because index templates are only applied
once at index creation time.
# Verify that the updates in `hamlet_template` did not apply to the
# existing indices
# In one request, delete both `hamlet2` and `hamlet_test`


GET _template/hamlet_template
PUT _template/hamlet_template
{
   "index_patterns" : [
      "hamlet_*",
      "hamlet-*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "number_of_replicas" : "0"
      }
    },
  "mappings": {
    "properties": {
      "speaker":{
        "type":"text"
      },
      "line_number":{
        "type":"text"
      },
      "text_entry":{
        "type":"text",
        "analyzer": "english"
      }
    }
  }
}

GET hamlet_test
DELETE hamlet2,hamlet_test

# Create the index `hamlet-1` and add some documents by running the
# following _bulk command
PUT hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet-1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet-1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet-1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet-1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}

# Verify that the mapping of `hamlet-1` is consistent with what defined
in `hamlet_template`

# Update `hamlet_template` so as to reject any document having a
# field that is not defined in the mapping
# Verify that you cannot index the following document in `hamlet-1` PUT
hamlet-1/_doc
{
 "author": "Shakespeare"
}

这里如果想要在update hamlet_template 的时候对hamlet-1生效，只能删掉hamlet-1然后进行重建


PUT hamlet-1/_mapping
{
  "dynamic":"strict"
}

POST hamlet-1/_doc
{
 "author": "Shakespeare"
}

# Update `hamlet_template` so as to enable dynamic mapping again
# Update `hamlet_template` so as to (i) dynamically map to an
# integer any field that starts by "number_", (ii) dynamically
# map to unanalysed text any string field
# Create the index `hamlet-2` and add a document by running the
# following commandPOST hamlet-2/_doc/4
{
 "text_entry": "With turbulent and dangerous lunacy?",
 "line_number": "3.1.4",
 "number_act": "3",
 "speaker": "KING CLAUDIUS"
}
# Verify that the mapping of `hamlet-2` is consistent with what
# defined in `hamlet_template`


GET _template/hamlet_template
PUT _template/hamlet_template
{
  "order": 0,
  "index_patterns": [
    "hamlet_*",
    "hamlet-*"
  ],
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "dynamic": true,
    "dynamic_templates": [
      {
        "longs_as_strings": {
          "match": "number_*",
          "mapping": {
            "type": "integer"
          }
        }
      },
      {
        "longs_as_strings": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ],
    "properties": {
      "line_number": {
        "type": "text"
      },
      "text_entry": {
        "analyzer": "english",
        "type": "text"
      },
      "speaker": {
        "type": "text"
      }
    }
  },
  "aliases": {}
}


POST hamlet-2/_doc/4
{
 "text_entry": "With turbulent and dangerous lunacy?",
 "line_number": "3.1.4",
 "number_act": "3",
 "speaker": "KING CLAUDIUS"
}

GET hamlet-2/_mapping

3. exercise03: alias,reindex,pipeline use

# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create an alias, reindex indices, and create data pipelines
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`

As usual, let’s begin by indexing some data.
# Create the indices `hamlet-1` and `hamlet-2`, each with two
# primary shards and no replicas
# Add some documents to `hamlet-1` by running the following command

PUT  hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet-1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet-1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet-1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet-1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}# Add some documents to `hamlet-2`
by running the following commandPUT hamlet-2/_doc/_bulk
{"index":{"_index":"hamlet-2","_id":4}}
{"line_number":"2.1.1","speaker":"LORD POLONIUS","text_entry":"Give
him this money and these notes, Reynaldo."}
{"index":{"_index":"hamlet-2","_id":5}}
{"line_number":"2.1.2","speaker":"REYNALDO","text_entry":"I will, my
lord."}
{"index":{"_index":"hamlet-2","_id":6}}
{"line_number":"2.1.3","speaker":"LORD POLONIUS","text_entry":"You
shall do marvellous wisely, good Reynaldo,"}
{"index":{"_index":"hamlet-2","_id":7}}
{"line_number":"2.1.4","speaker":"LORD POLONIUS","text_entry":"Before
you visit him, to make inquire"}

# Create the alias `hamlet` that maps both `hamlet-1` and `hamlet-2`
# Verify that the documents grouped by `hamlet` are 8
By default, if your alias includes more than one index, you cannot index
documents using the alias name. But defaults can be overwritten, if you know
how.
# Configure `hamlet-1` to be the write index of the `hamlet` alias

DELETE hamlet*
PUT hamlet-1
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}
PUT hamlet-2
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}

很久没有写alias相关的了，差点失手。。。冷静的查找文档
# Create the alias `hamlet` that maps both `hamlet-1` and `hamlet-2`
# Verify that the documents grouped by `hamlet` are 8
# Configure `hamlet-1` to be the write index of the `hamlet` alias

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "hamlet-1",
        "alias": "hamlet",
        "is_write_index": true
      }
    },
    {
      "add": {
        "index": "hamlet-2",
        "alias": "hamlet"
      }
    }
  ]
}


PUT hamlet/_doc/1
{
  "message":"you want to be stronger"
}
GET hamlet/_count


# Add a document to `hamlet`, so that the document 
# (i) has id "8",
# (ii) has "_doc" type, 
# (iii) has a field `text_entry` with value  "With turbulent and dangerous lunacy?", 
# (iv) has a field  `line_number` with value "3.1.4", 
# (v) has a field `speaker`  with value "KING CLAUDIUS"

# Create a script named `control_reindex_batch` and save it into the
# cluster state. The script checks whether a document has the
# field `reindexBatch`, and
 (i) in the affirmative case, it increments the field value by a script parameter named  `increment`, 
 (ii) otherwise, the script adds the field to the  document setting its value to "1"

多练习这种script需要存储起来的场景。script的api可以参考painless guide部分


PUT _scripts/control_reindex_batch
{
  "script":{
    "lang":"painless",
    "source": """
    if(ctx._source.containsKey('reindexBatch')){
      ctx._source.reindexBatch+=params.increment;
    }else{
      ctx._source.reindexBatch=1;
    }
    """
  }
}

POST hamlet-1/_update_by_query
{
  "script":{
    "id":"control_reindex_batch",
    "params":{
      "increment":3
    }
  }
}
GET hamlet-1/_search


# Create the index `hamlet-new` with 2 primary shards and no  replicas
# Reindex `hamlet` into `hamlet-new`, while satisfying the following
# criteria: 
(i) apply the `control_reindex_batch` script with the  `increment` parameter set to "1", 
(ii) reindex using two  parallel slices


# In one request, add `hamlet-new` to the alias `hamlet` and delete
# the `hamlet` and `hamlet-2` indices



PUT hamlet-new
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}


POST _reindex?slices=2
{
  "source": {"index": "hamlet"},
  "dest": {
    "index": "hamlet-new"
  },
  "script":{
      "id":"control_reindex_batch",
      "params": {
        "increment":1
      }
    }
}

GET hamlet-new/_search


POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "hamlet-new",
        "alias": "hamlet"
      }
    },
    {
      "remove": {
        "indices": ["hamlet-1","hamlet-2"],  # 需要注意的是这里多个索引的话json的key为indices,单数的话为index
        "alias": "hamlet"
      }
    }
  ]
}

GET hamlet/_search

# Create a pipeline named `split_act_scene_line`. The pipeline
# splits the value of `line_number` using the dots as a
# separator, and stores the split values into three
# new fields named `number_act`, `number_scene`, and
# `number_line`, respectively

# Test the pipeline on the following document{
 "_source": {
 "line_number": "1.2.3"
}
}
Satisfied with the outcome? Go update your documents, then!
# Update all documents in `hamlet-new` by using the
# `split_act_scene_line` pipeline

结合set processor 和 script processor


POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "string split by dot",
    "processors": [
      {
        "split": {
          "field": "line_number",
          "separator": "\\.",
          "target_field":"temp_arry"
        }
      },
      {
        "script": {
          "lang": "painless",
          "source": """
        ctx.number_act=ctx.temp_arry[0];
        ctx.number_scene=ctx.temp_arry[1];
        ctx.number_line=ctx.temp_arry[2];
"""
        }
      },
      {
        "remove": {
          "field": "temp_arry"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "line_number": "1.1.3",
        "text_entry": "Long live the king!",
        "reindexBatch": 2,
        "speaker": "BERNARDO"
      }
    }
  ]
}

PUT _ingest/pipeline/split_act_scene_line
{
  "description": "string split by dot",
  "processors": [
    {
      "split": {
        "field": "line_number",
        "separator": "\\.",
        "target_field": "temp_arry"
      }
    },
    {
      "script": {
        "lang": "painless",
        "source": """
        ctx.number_act=ctx.temp_arry[0];
        ctx.number_scene=ctx.temp_arry[1];
        ctx.number_line=ctx.temp_arry[2];
"""
      }
    },
    {
      "remove": {
        "field": "temp_arry"
      }
    }
  ]
}

POST hamlet-new/_update_by_query?pipeline=split_act_scene_line

GET hamlet-new/_search