文章目录
1. exercise01: update delete by query
# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create, update and delete indices while satisfying a given
# set of requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`
# Create the index `hamlet-raw` with 1 primary shard and 3 replicas
# Add a document to `hamlet-raw`, so that the document (i) has id
# "1", (ii) has default type, (iii) has one field named `line`
# with value "To be, or not to be: that is the question"
# Update the document with id "1" by adding a field named
# `line_number` with value "3.1.64"
# Add a new document to `hamlet-raw`, so that the document (i) has
# the id automatically assigned by Elasticsearch, (ii) has
# default type, (iii) has a field named `text_entry` with value
# "Whether tis nobler in the mind to suffer", (iv) has a field
# named `line_number` with value "3.1.66"
# Update the last document by setting the value of `line_number` to
# "3.1.65"
# In one request, update all documents in `hamlet-raw` by adding a
# new field named `speaker` with value "Hamlet"
# Update the document with id "1" by renaming the field `line` into
# `text_entry`
题解
PUT hamlet-raw
{
"settings": {
"number_of_replicas": 3,
"number_of_shards": 1
}
}
PUT hamlet-raw/_doc/1
{
"line":"To be, or not to be: that is the question"
}
POST hamlet-raw/_update/1
{
"doc" : {
"line_number" : "3.1.64"
}
}
GET hamlet-raw/_doc/1
POST hamlet-raw/_doc
{
"text_entry": "text_entry",
"line_number": "3.1.66"
}
# 根据返回的id进行操作
POST hamlet-raw/_update/2uDDLHYBznFAtuOD6g0k
{
"doc":{
"line_number": "3.1.65"
}
}
POST hamlet-raw/_update_by_query
{
"script":{
"lang":"painless",
"source":"ctx._source.speaker='Hamlet'"
}
}
GET hamlet-raw/_search
使用ingest pipeline
PUT _ingest/pipeline/rename_field
{
"description": "rename field",
"processors": [
{"rename": {
"field": "line",
"target_field": "text_entry"
}}
]
}
POST hamlet-raw/_update_by_query?pipeline=rename_field
{
"query": {
"ids": {
"values": [
"1"
]
}
}
}
GET hamlet-raw/_search
也可以用script来处理
POST hamlet-raw/_update/2
{
"script":{
"lang":"painless",
"source": "ctx._source.text_entry=ctx._source.remove('line')"
}
}
第二题
# Create the index `hamlet` and add some documents by running the
# following _bulk commandPUT hamlet/_doc/_bulk
{"index":{"_index":"hamlet","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}
{"index":{"_index":"hamlet","_id":4}}
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The
memory be green, and that it us befitted"}
{"index":{"_index":"hamlet","_id":5}}
{"line_number":"1.3.1","speaker":"LAERTES","text_entry":"My
necessaries are embarkd: farewell:"}
{"index":{"_index":"hamlet","_id":6}}
{"line_number":"1.3.4","speaker":"LAERTES","text_entry":"But let me
hear from you."}
{"index":{"_index":"hamlet","_id":7}}
{"line_number":"1.3.5","speaker":"OPHELIA","text_entry":"Do you doubt
that?"}
{"index":{"_index":"hamlet","_id":8}}
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites
shrewdly; it is very cold."}
{"index":{"_index":"hamlet","_id":9}}
{"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a
nipping and an eager air."}
{"index":{"_index":"hamlet","_idd":10}}
{"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour
now?"}
{"index":{"_index":"hamlet","_id":11}}
{"line_number":"1.5.2","speaker":"Ghost","text_entry":"Mark me."}
{"index":{"_index":"hamlet","_id":12}}
{"line_number":"1.5.3","speaker":"HAMLET","text_entry":"I will."}
# Create a script named `set_is_hamlet` and save it into the cluster
# state. The script (i) adds a field named `is_hamlet` to each
# document, (ii) sets the field to "true" if the document has
# `speaker` equals to "HAMLET", (iii) sets the field to "false"
# otherwise
# Update all documents in `hamlet` by running the `set_is_hamlet`
# script
Pretty convenient the “update_by_query” API, don’t you think? Do you also
know how to use its counterpart for deletion?
# Remove from `hamlet` the documents that have either "KING
# CLAUDIUS" or "LAERTES" as the value of `speaker`
这里需要注意的是先存储script,然后再使用的模式,之前很少这样用。
# 先用这个语法整一下
POST hamlet/_update_by_query
{
"script":{
"lang":"painless",
"source":"""
if(ctx._source.speaker.equals('HAMLET')){
ctx._source.is_hamlet=true;
}else{
ctx._source.is_hamlet=false;
}
"""
}
}
把上面的语句存储一下, search template也是可以这里存储
PUT _scripts/set_is_hamlet
{
"script":{
"lang":"painless",
"source":"""
if(ctx._source.speaker.equals('HAMLET')){
ctx._source.is_hamlet=true;
}else{
ctx._source.is_hamlet=false;
}
"""
}
}
使用存储的script
POST hamlet/_update_by_query
{
"script":{
"id":"set_is_hamlet"
}
}
GET hamlet/_search
删除操作
POST hamlet/_delete_by_query
{
"query": {
"terms": {
"speaker.keyword": [
"KING CLAUDIUS",
"LAERTES"
]
}
}
}
2. exercise02: index template
# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create index templates that satisfy a given set of
# requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`
# Create the index template `hamlet_template`, so that the template
# (i) matches any index that starts by "hamlet_" or "hamlet-",
# (ii) allocates one primary shard and no replicas for each
# matching index
# Create the indices `hamlet2` and `hamlet_test`
# Verify that only `hamlet_test` applies the settings defined in
# `hamlet_template`
template 没有办法进行部分update,update操作和创建操作一样,是直接的全部覆盖。
DELETE hamlet*
DELETE _template/hamlet*
PUT _template/hamlet_template
{
"index_patterns":["hamlet_*","hamlet-*"],
"settings":{
"number_of_shards":1,
"number_of_replicas":0
}
}
PUT hamlet2
PUT hamlet_test
GET _cat/shards/hamlet2?v
GET _cat/shards/hamlet_test?v
# Update `hamlet_template` by defining a mapping for the type
# "_doc", so that (i) the type has three fields, named `speaker`,
# `line_number`, and `text_entry`, (ii) `text_entry` uses an
# "english" analyzer
Updates to an index template are not automatically reflected on the matching
indices that already exist. This is because index templates are only applied
once at index creation time.
# Verify that the updates in `hamlet_template` did not apply to the
# existing indices
# In one request, delete both `hamlet2` and `hamlet_test`
GET _template/hamlet_template
PUT _template/hamlet_template
{
"index_patterns" : [
"hamlet_*",
"hamlet-*"
],
"settings" : {
"index" : {
"number_of_shards" : "1",
"number_of_replicas" : "0"
}
},
"mappings": {
"properties": {
"speaker":{
"type":"text"
},
"line_number":{
"type":"text"
},
"text_entry":{
"type":"text",
"analyzer": "english"
}
}
}
}
GET hamlet_test
DELETE hamlet2,hamlet_test
# Create the index `hamlet-1` and add some documents by running the
# following _bulk command
PUT hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet-1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet-1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet-1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet-1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}
# Verify that the mapping of `hamlet-1` is consistent with what defined
in `hamlet_template`
# Update `hamlet_template` so as to reject any document having a
# field that is not defined in the mapping
# Verify that you cannot index the following document in `hamlet-1` PUT
hamlet-1/_doc
{
"author": "Shakespeare"
}
这里如果想要在update hamlet_template 的时候对hamlet-1生效,只能删掉hamlet-1然后进行重建
PUT hamlet-1/_mapping
{
"dynamic":"strict"
}
POST hamlet-1/_doc
{
"author": "Shakespeare"
}
# Update `hamlet_template` so as to enable dynamic mapping again
# Update `hamlet_template` so as to (i) dynamically map to an
# integer any field that starts by "number_", (ii) dynamically
# map to unanalysed text any string field
# Create the index `hamlet-2` and add a document by running the
# following commandPOST hamlet-2/_doc/4
{
"text_entry": "With turbulent and dangerous lunacy?",
"line_number": "3.1.4",
"number_act": "3",
"speaker": "KING CLAUDIUS"
}
# Verify that the mapping of `hamlet-2` is consistent with what
# defined in `hamlet_template`
GET _template/hamlet_template
PUT _template/hamlet_template
{
"order": 0,
"index_patterns": [
"hamlet_*",
"hamlet-*"
],
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0"
}
},
"mappings": {
"dynamic": true,
"dynamic_templates": [
{
"longs_as_strings": {
"match": "number_*",
"mapping": {
"type": "integer"
}
}
},
{
"longs_as_strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"properties": {
"line_number": {
"type": "text"
},
"text_entry": {
"analyzer": "english",
"type": "text"
},
"speaker": {
"type": "text"
}
}
},
"aliases": {}
}
POST hamlet-2/_doc/4
{
"text_entry": "With turbulent and dangerous lunacy?",
"line_number": "3.1.4",
"number_act": "3",
"speaker": "KING CLAUDIUS"
}
GET hamlet-2/_mapping
3. exercise03: alias,reindex,pipeline use
# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create an alias, reindex indices, and create data pipelines
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`
As usual, let’s begin by indexing some data.
# Create the indices `hamlet-1` and `hamlet-2`, each with two
# primary shards and no replicas
# Add some documents to `hamlet-1` by running the following command
PUT hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet-1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet-1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet-1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet-1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}# Add some documents to `hamlet-2`
by running the following commandPUT hamlet-2/_doc/_bulk
{"index":{"_index":"hamlet-2","_id":4}}
{"line_number":"2.1.1","speaker":"LORD POLONIUS","text_entry":"Give
him this money and these notes, Reynaldo."}
{"index":{"_index":"hamlet-2","_id":5}}
{"line_number":"2.1.2","speaker":"REYNALDO","text_entry":"I will, my
lord."}
{"index":{"_index":"hamlet-2","_id":6}}
{"line_number":"2.1.3","speaker":"LORD POLONIUS","text_entry":"You
shall do marvellous wisely, good Reynaldo,"}
{"index":{"_index":"hamlet-2","_id":7}}
{"line_number":"2.1.4","speaker":"LORD POLONIUS","text_entry":"Before
you visit him, to make inquire"}
# Create the alias `hamlet` that maps both `hamlet-1` and `hamlet-2`
# Verify that the documents grouped by `hamlet` are 8
By default, if your alias includes more than one index, you cannot index
documents using the alias name. But defaults can be overwritten, if you know
how.
# Configure `hamlet-1` to be the write index of the `hamlet` alias
DELETE hamlet*
PUT hamlet-1
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
}
}
PUT hamlet-2
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
}
}
很久没有写alias相关的了,差点失手。。。冷静的查找文档
# Create the alias `hamlet` that maps both `hamlet-1` and `hamlet-2`
# Verify that the documents grouped by `hamlet` are 8
# Configure `hamlet-1` to be the write index of the `hamlet` alias
POST /_aliases
{
"actions": [
{
"add": {
"index": "hamlet-1",
"alias": "hamlet",
"is_write_index": true
}
},
{
"add": {
"index": "hamlet-2",
"alias": "hamlet"
}
}
]
}
PUT hamlet/_doc/1
{
"message":"you want to be stronger"
}
GET hamlet/_count
# Add a document to `hamlet`, so that the document
# (i) has id "8",
# (ii) has "_doc" type,
# (iii) has a field `text_entry` with value "With turbulent and dangerous lunacy?",
# (iv) has a field `line_number` with value "3.1.4",
# (v) has a field `speaker` with value "KING CLAUDIUS"
# Create a script named `control_reindex_batch` and save it into the
# cluster state. The script checks whether a document has the
# field `reindexBatch`, and
(i) in the affirmative case, it increments the field value by a script parameter named `increment`,
(ii) otherwise, the script adds the field to the document setting its value to "1"
多练习这种script需要存储起来的场景。script的api可以参考painless guide部分
PUT _scripts/control_reindex_batch
{
"script":{
"lang":"painless",
"source": """
if(ctx._source.containsKey('reindexBatch')){
ctx._source.reindexBatch+=params.increment;
}else{
ctx._source.reindexBatch=1;
}
"""
}
}
POST hamlet-1/_update_by_query
{
"script":{
"id":"control_reindex_batch",
"params":{
"increment":3
}
}
}
GET hamlet-1/_search
# Create the index `hamlet-new` with 2 primary shards and no replicas
# Reindex `hamlet` into `hamlet-new`, while satisfying the following
# criteria:
(i) apply the `control_reindex_batch` script with the `increment` parameter set to "1",
(ii) reindex using two parallel slices
# In one request, add `hamlet-new` to the alias `hamlet` and delete
# the `hamlet` and `hamlet-2` indices
PUT hamlet-new
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
}
}
POST _reindex?slices=2
{
"source": {"index": "hamlet"},
"dest": {
"index": "hamlet-new"
},
"script":{
"id":"control_reindex_batch",
"params": {
"increment":1
}
}
}
GET hamlet-new/_search
POST _aliases
{
"actions": [
{
"add": {
"index": "hamlet-new",
"alias": "hamlet"
}
},
{
"remove": {
"indices": ["hamlet-1","hamlet-2"], # 需要注意的是这里多个索引的话json的key为indices,单数的话为index
"alias": "hamlet"
}
}
]
}
GET hamlet/_search
# Create a pipeline named `split_act_scene_line`. The pipeline
# splits the value of `line_number` using the dots as a
# separator, and stores the split values into three
# new fields named `number_act`, `number_scene`, and
# `number_line`, respectively
# Test the pipeline on the following document{
"_source": {
"line_number": "1.2.3"
}
}
Satisfied with the outcome? Go update your documents, then!
# Update all documents in `hamlet-new` by using the
# `split_act_scene_line` pipeline
结合set processor 和 script processor
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "string split by dot",
"processors": [
{
"split": {
"field": "line_number",
"separator": "\\.",
"target_field":"temp_arry"
}
},
{
"script": {
"lang": "painless",
"source": """
ctx.number_act=ctx.temp_arry[0];
ctx.number_scene=ctx.temp_arry[1];
ctx.number_line=ctx.temp_arry[2];
"""
}
},
{
"remove": {
"field": "temp_arry"
}
}
]
},
"docs": [
{
"_source": {
"line_number": "1.1.3",
"text_entry": "Long live the king!",
"reindexBatch": 2,
"speaker": "BERNARDO"
}
}
]
}
PUT _ingest/pipeline/split_act_scene_line
{
"description": "string split by dot",
"processors": [
{
"split": {
"field": "line_number",
"separator": "\\.",
"target_field": "temp_arry"
}
},
{
"script": {
"lang": "painless",
"source": """
ctx.number_act=ctx.temp_arry[0];
ctx.number_scene=ctx.temp_arry[1];
ctx.number_line=ctx.temp_arry[2];
"""
}
},
{
"remove": {
"field": "temp_arry"
}
}
]
}
POST hamlet-new/_update_by_query?pipeline=split_act_scene_line
GET hamlet-new/_search