3、ElasticSearch的增删改查

最新推荐文章于 2024-08-21 12:09:53 发布

码农的进阶之路

最新推荐文章于 2024-08-21 12:09:53 发布

阅读量291

点赞数

分类专栏： Elastic Stack学习之旅文章标签： elasticsearch search index

本文链接：https://blog.csdn.net/zyxwvuuvwxyz/article/details/108677639

版权

Elastic Stack学习之旅专栏收录该内容

12 篇文章 16 订阅

订阅专栏

在进行实际操作之前，很有必要先了解下ElasticSearch中的基本术语及概念。

1、基本概念

1.1、索引(index)

索引(index)是es对逻辑数据的逻辑存储，可以把索引看成关系型数据库的表。

es可以把索引存放在一台机器或者分散存储在多台机器上，每个索引有一或多个主分片(shard),每个主分片可以有多个副本(replica)。

索引也是文档的容器，是一类文档的结合。每个索引都有自己的Mapping定义，用于定义包含的文档的字段名和字段类型。

1.2、文档(document)

存储在es中的主要实体叫文档(document)。类比关系型数据库，一个文档相当于数据库表中的一行记录

文档由多个字段组成，每个字段的类型可以是文本、数值、日期等，字段类型也可以是复杂类型

文档的元数据用于标注文档的相关信息，如

{
        "_index" : "house",
        "_type" : "room",
        "_id" : "7XfOkHQBEemceziRSxnW",
        "_score" : 1.0,
        "_source" : {
          "name" : "客厅",
          "area" : "30平米"
        }
      }

这些元数据中字段的含义如下

_index 文档所属的索引名
_type 文档所属的类型名
_id 文档唯一id
_source 文档的原始json数据
_score 相关性打分

1.3、映射(mapping)

Mapping类似数据库中的schema定义，作用如下：

定义索引中的字段的名称
定义字段的数据类型，例如字符串、数据、布尔
字段、倒排索引的相关配置

Mapping会把JSON文档映射成Lucene所需要的扁平格式
一个Mapping属于一个索引的Type，每个文档都属于一个Type。

1.4、动态映射(Dynamic Mapping)

在写入文档时，如果索引不存在会自动创建索引。

Dynamic Mapping的机制，使得我们无需手动定义Mappings,Elasticsearch会自动根据文档信息推算出字段的类型。但是有时候会推算的不对，例如地理位置信息。当类型如果设置不对时，会导致一些功能无法正常运行，例如range查询。

后面对动态映射会给出例子

类型的自动识别:

JSON类型	字段类型
Boolean：`true` or `false`	`boolean`
`123`	`long`
`123.45`	`double`
字符串 `2020-09-20`	`date`
字符串`test`	`text`或`keyword`

text与keyword的区别：

text类型，当一个字段是要被全文搜索的，如描述、email内容，设置text类型会，字段内容会被分析，在生成倒排索引以前，字符串会被分析器分成一个一个词项。text类型的字段不用于排序，很少用于聚合
keyword类型适用于索引结构化的字段，比如标签、状态码，可用于排序、聚合等，keyword类型的字段只能通过精确值搜索的到

1.5、主分片（Primary Shard)与副本(Replica Shard)

主分片，用以解决数据水平扩展的问题。通过主分片，可以将数据分布到集群内的所有节点之上。
- 一个分片是一个运行的Lucene实例
- 主分片在索引创建时指定，后续不允许修改，除非Reindex
副本，用以解决数据高可用的问题，副本是主分片的拷贝
- 副本分片数，可以动态调整
- 增加副本数，还可以在一定程度上提高服务的可用性

以一个例子说明分片、副本的分配

假设集群 1主2从，索引的声明如下

PUT http://localhost:9200/house
{
  "settings": {
    "number_of_shards": 3, ## 主分片数
    "number_of_replicas": 1 ## 副本分片数
  }
}

表示主分片有3个，副本分片有1个。每个分片的副本分散到其他的节点上
在这里插入图片描述
上图中粗框的节点表示主分片，细框的节点表示副本分片

分片的设定

对生产环境中分片的设定，需要提前做好容量规划
- 分片数设置过小
  - 导致后续无法增加节点实现水平扩展
  - 单个分片的数据量太大，导致数据重新分片耗时
- 分片数设置过大，(7.0版本开始，默认主分片设置为1，解决了over-sharding的问题)
  - 影响搜索结果的相关性打分，影响统计结果的准确性
  - 单个节点上过多的分片，会导致资源浪费，同时也会影响性能

2、基本api的操作

2.1、创建索引

POST http://localhost:9200/house
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}
响应信息
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "house"
}

以上即创建了一个名为house的索引

2.2、查看索引信息

GET http://localhost:9200/house/_settings?pretty

{
  "house" : {
    "settings" : {
      "index" : {
        "creation_date" : "1600145784758",
        "number_of_shards" : "3",
        "number_of_replicas" : "1",
        "uuid" : "MN4RCulHSHCWinor61alkQ",
        "version" : {
          "created" : "6050199"
        },
        "provided_name" : "house"
      }
    }
  }
}

2.3、显示索引的mapping定义

GET http://localhost:9200/house/_mapping
## 响应
{
  "house" : {
    "mappings" : { }
  }
}

响应结果为空说明这个索引下还未建立文档，故没有mapping定义。

2.4、删除索引

DELETE http://localhost:9200/house

2.5、插入文档数据

2.5.1、手动声明mapping定义并新建数据

声明mapping定义
PUT http://localhost:9200/house

## 请求体
{
    "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
   },
  "mappings": {
    "room": {
      "properties": {
        "name": {
          "type": "text"
        },
        "area": {
          "type": "keyword"
        }
      }
    }
  }
}

这里显式设置了主分片和副本分片数。声明了一个类型room，room下有两个字段：name/area，以及它们的数据类型。

插入数据
格式:http://ip:port/{indexName}/{documentTypeName}/{id}

POST http://localhost:9200/house/room/1
## 请求体
{
  "name":"卧室",
  "area":"20平米"
}

## 响应数据
{
  "_index" : "house",       ==============>索引名称
  "_type" : "room",			==============>类型名称
  "_id" : "1",				==============>文档数据主键 唯一标识一条数据
  "_version" : 1,			==============>文档数据版本号 每次调整数据 版本号都+1 类似乐观锁
  "result" : "created",     ==============>响应结果 这里created表示创建成功
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

查看mapping定义

GET http://localhost:9200/house/_mapping

## 响应
{
   "mappings" : {
      "room" : {
        "properties" : {
          "area" : {
            "type" : "keyword"
          },
          "name" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

可以看出这个文档类型room的mapping即我们自己定义的

2.5.1、Dynamic Mapping新建数据

新增数据

POST http://localhost:9200/house/room/1
## 请求体
POST /house/room/1
{
  "name":"卧室",
  "area":"20平米"
}

## 响应信息
#! Deprecation: the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template
{
  "_index" : "house",
  "_type" : "room",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

备注：响应信息中的描述是说须主动设置主分片并且默认主分片数在7.0版本后由5个–>1个

查看索引信息及mapping定义

## 查看索引信息
GET http://localhost:9200/house/_settings

## 响应
{
  "house" : {
    "settings" : {
      "index" : {
        "creation_date" : "1600155716142",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "tRC0tfojTKKpE2II2mrTrg",
        "version" : {
          "created" : "6050199"
        },
        "provided_name" : "house"
      }
    }
  }
}

由于没有显式指定分片数，因此这里的分片应该是系统默认的

## 文档的mapping定义
GET http://localhost:9200/house/_mapping

## 响应
{
  "house" : {
    "mappings" : {
      "room" : {
        "properties" : {
          "area" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

综合2.5.1及2.5.2两个例子可知，在创建索引、插入数据时，即便不指定索引结构、文档mapping定义，es也会在底层进行结构化操作，这对用户是透明的。

如果新增数据时，未显式指定id，es则会默认生成一个id
以下访问ElasticSearch RESTful api时均省略ip:port，使用uri

2.6、更新数据

2.6.1、全量更新

这种方式下，更新数据传入的值会覆盖原数据的值。若原数据的某个字段未被覆盖，该字段会在更新操作后丢失

如将id=1文档数据的name值改为"卧室1"

PUT /house/room/1
{
  "name":"卧室1",
  "area":"20平米"
}
## 响应
{
  "_index" : "house",
  "_type" : "room",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

响应中的_version类似于乐观锁的实现，当数据发生变化，_version+1

2.6.2、增量更新

即需要调整哪个字段就更新哪个字段

如将id=1数据的area调整为"21平米"

POST /house/room/1/_update
{
  "doc":{
    "area":"21平米"
  }
}
## 响应
{
  "_index" : "house",
  "_type" : "room",
  "_id" : "1",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 10,
  "_primary_term" : 1
}

2.7、删除数据

如删除id=1的数据

DELETE /house/room/1
##响应
{
  "_index" : "house",
  "_type" : "room",
  "_id" : "1",
  "_version" : 4,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 11,
  "_primary_term" : 1
}

响应中的result：deleted 表示已经删除，并且版本号也发生变化了

如果删除一个不存在的数据，则会提示找不到
如删除id=111的数据

DELETE /house/room/111

## 响应
{
  "_index" : "house",
  "_type" : "room",
  "_id" : "111",
  "_version" : 1,
  "result" : "not_found",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 12,
  "_primary_term" : 1
}

并且http的响应码是404 NOT Found
result：not_found 表示不存在的数据

说明：删除一个文档也不会立即从磁盘上移除，它只是被标记成已删除。Elasticsearch将会在你之后添加更多索引的时候才会在后台进行删除内容的清理

2.8、数据的简单搜索

2.8.1、根据id搜索

GET /house/room/1
## 响应
{
  "_index" : "house",
  "_type" : "room",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "卧室1",
    "area" : "20平米"
  }
}

2.8.2、指定搜索响应字段

如查询id=1数据的area、name字段

GET /house/room/1?_source=area,name
##响应
{
  "_index" : "house",
  "_type" : "room",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "area" : "20平米",
    "name" : "卧室1"
  }
}

2.8.3、指定搜索结果的原始数据

GET /house/room/1/_source
## 响应
{
  "name" : "卧室1",
  "area" : "20平米"
}

2.8.4、指定索引的查询

语法	查找范围
/_search	集群上所有的索引
/index/_search	index
/index1,index2/_search	index1和index2
/index*/_search	以index开头的索引

搜索house搜索引下root文档的全部数据

GET /house/room/_search
## 响应
{
  "took" : 90,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "house",
        "_type" : "room",
        "_id" : "7XfOkHQBEemceziRSxnW",
        "_score" : 1.0,
        "_source" : {
          "name" : "客厅",
          "area" : "30平米"
        }
      },
      {
        "_index" : "house",
        "_type" : "room",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "卧室1",
          "area" : "20平米"
        }
      }
    ]
  }
}

2.8.5、字段高亮显示

举例：将查询结果的hobby字段中"音乐"进行高亮显示

GET /user/_search
{
  "query": {
    "match": {
      "hobby": "音乐"
    }
  },
  "highlight": {
    "fields": {
      "hobby": {}
    }
  }
}
## 响应
{
  "took" : 134,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.58446556,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "people",
        "_id" : "hmk5jHQBzgrkYgRNOrUU",
        "_score" : 0.58446556,
        "_source" : {
          "name" : "王五",
          "age" : 22,
          "mail" : "333@qq.com",
          "hobby" : "羽毛球、篮球、游泳、听音乐"
        },
        "highlight" : {
          "hobby" : [
            "羽毛球、篮球、游泳、听<em>音乐</em>"
          ]
        }
      },
      {
        "_index" : "user",
        "_type" : "people",
        "_id" : "iGk5jHQBzgrkYgRNOrUU",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "孙七",
          "age" : 24,
          "mail" : "555@qq.com",
          "hobby" : "听音乐、看电影、羽毛球"
        },
        "highlight" : {
          "hobby" : [
            "听<em>音乐</em>、看电影、羽毛球"
          ]
        }
      }
    ]
  }
}

2.8.6、聚合

聚合类型于SQL中的group by操作

举例：查询索引user中age的平均值

GET /user/_search
{
  "aggs": {
    "return_avg_expires_in":{
      "avg": {
        "field": "age"
      }
    }
  }
}
## 响应
{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "people",
        "_id" : "hGk5jHQBzgrkYgRNOrUU",
        "_score" : 1.0,
        "_source" : {
          "name" : "张三",
          "age" : 20,
          "mail" : "111@qq.com",
          "hobby" : "羽毛球、乒乓球、足球"
        }
      },
      {
        "_index" : "user",
        "_type" : "people",
        "_id" : "hWk5jHQBzgrkYgRNOrUU",
        "_score" : 1.0,
        "_source" : {
          "name" : "李四",
          "age" : 21,
          "mail" : "222@qq.com",
          "hobby" : "羽毛球、乒乓球、足球、篮球"
        }
      },
      {
        "_index" : "user",
        "_type" : "people",
        "_id" : "iGk5jHQBzgrkYgRNOrUU",
        "_score" : 1.0,
        "_source" : {
          "name" : "孙七",
          "age" : 24,
          "mail" : "555@qq.com",
          "hobby" : "听音乐、看电影、羽毛球"
        }
      },
      {
        "_index" : "user",
        "_type" : "people",
        "_id" : "hmk5jHQBzgrkYgRNOrUU",
        "_score" : 1.0,
        "_source" : {
          "name" : "王五",
          "age" : 22,
          "mail" : "333@qq.com",
          "hobby" : "羽毛球、篮球、游泳、听音乐"
        }
      },
      {
        "_index" : "user",
        "_type" : "people",
        "_id" : "h2k5jHQBzgrkYgRNOrUU",
        "_score" : 1.0,
        "_source" : {
          "name" : "赵六",
          "age" : 23,
          "mail" : "444@qq.com",
          "hobby" : "跑步、游泳、篮球"
        }
      }
    ]
  },
  "aggregations" : {
    "return_avg_expires_in" : {
      "value" : 22.0
    }
  }
}