elasticsearch基础操作说明

最新推荐文章于 2024-07-05 16:51:40 发布

飞鸟真人

最新推荐文章于 2024-07-05 16:51:40 发布

阅读量2.1k

点赞数

分类专栏：应用文章标签： elasticsearch ES 使用说明

本文链接：https://blog.csdn.net/robinfoxnan/article/details/123851521

版权

应用专栏收录该内容

48 篇文章 1 订阅

订阅专栏

基本操作

1. 概述

对于初次接触elasticsearch的童鞋们，为了便于理解，在此拿elasticsearch与mysql作对比进行说明：

ES中的index相当于mysql的db，一个mysql可以有多个db，类似的，一个ES集群可以有多个index。
ES中的type相当于mysql中的某个表，mysql中的某个db可以有多个表，在某个表中存储我们的某一类数据。
ES中的type对应的mapping，相当于mysql中的表结构，定义了不同字段的数据类型。

即，总结一下：

当我们想要在ES中存储我们的某类业务数据时，需要

1、先建立一个index；

2、在这个index中建立一个对应的type，并定义它的数据格式mapping；

3、前两步操作成功之后即可以在ES中存储我们的数据；

4、新版本中不建议使用type，直接在index中存储数据，使用默认的_doc；

2. 索引管理

https://www.jianshu.com/p/fd76204e26a0

1.1 新建立索引

put http://127.0.0.1:9200/student_index

注意：默认情况下，创建的索引分片数量是 5 个，副本数量是 1 个。

您可以通过如下参数来指定分片数、副本数量：

{
	"settings": {
		"number_of_shards": 3,
		"number_of_replicas": 2
	}
}

成功

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "student_index"
}

2.2 创建type

Post  127.0.0.1:9200/student_index/student

{
    "mappings": {
        
            "properties": {
                "id": {
                    "type": "keyword"
                },
                "name": {
                    "type": "keyword"
                },
                "score": {
                    "type": "double"
                }
            }
        
    }
}

返回结果：

{
    "_index": "student_index",
    "_type": "student",
    "_id": "Y1SU2H8BR65z4KJlw5rm",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 3,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

2.3 修改索引的副本数

我们可以通过如下 API 来修改索引的副本数：

PUT http://127.0.0.1:9200/commodity/_settings

入参：

{
	"number_of_replicas": 3
}

我们将 commodity 索引副本数更新为了 3：

2.4 删除索引

delete  http://127.0.0.1:9200/commodity

如果不存在会告错

{
    "error": {
        "root_cause": [
            {
                "type": "index_not_found_exception",
                "reason": "no such index [commodity1]",
                "resource.type": "index_or_alias",
                "resource.id": "commodity1",
                "index_uuid": "_na_",
                "index": "commodity1"
            }
        ],
        "type": "index_not_found_exception",
        "reason": "no such index [commodity1]",
        "resource.type": "index_or_alias",
        "resource.id": "commodity1",
        "index_uuid": "_na_",
        "index": "commodity1"
    },
    "status": 404
}

成功为：

{
    "acknowledged": true
}

2.5 不使用type直接设置映射

官方已经不再推荐使用type，也就是index下直接存放数据，

Post  127.0.0.1:9200/student_index

{
    "mappings": {
        
            "properties": {
                "id": {
                    "type": "keyword"
                },
                "name": {
                    "type": "keyword"
                },
                "score": {
                    "type": "double"
                }
            }
        
    }
}

也就是直接将index当做表来使用，此时后续的文档增删改查需要指定type为**_doc**

这样在后续的相关操作中，路径中没有type的部分！

成功返回

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "student_index"
}

3. 文档管理

3.1 插入一个文档

在创建文档时_index、_type、_id三者唯一确定一个文档。所以要想保证文档是新加入的，最简单的方式是使用POST方法让Elasticsearch自动生成唯一_id，或者手动指定一个_id

对于_index、_type和抽象的理解为我们常用的关系数据库中的数据（_index）和表（_type）

添加：

POST  127.0.0.1:9200/student_index/student/21001

{
    "id":21001,
    "name":"robin",
    "score":90.1
}

这里是指定了_id，否则会创建一个唯一字符串，

{
    "_index": "student_index",
    "_type": "student",
    "_id": "21001",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 3,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

备注：如果不使用type，则需要指定_doc路径

比如创建文档：

POST 127.0.0.1:9200/student_index/_doc/21001

{
    "_index": "student_index",
    "_type": "_doc",
    "_id": "21001",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

3.2 更新文档

POST  127.0.0.1:9200/student_index/student/21001

{
    "id":21001,
    "name":"robin-fox",
    "score":99.1
}

更新成功：

{
    "_index": "student_index",
    "_type": "student",
    "_id": "21001",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 3,
    "_primary_term": 1
}

3.3 删除文档

DELETE 127.0.0.1:9200/student_index/student/21001
# 不使用type则
DELET 127.0.0.1:9200/student_index/_doc/21001

{
    "_index": "student_index",
    "_type": "student",
    "_id": "21001",
    "_version": 3,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 4,
    "_primary_term": 1
}

3.4搜索文档

3.4.1用文档的ID搜索

如果你知道或者能获取到你要搜索的文档的ID 那么直接index/type/id就好

GET 127.0.0.1:9200/student_index/student/21001
# 不使用type则
GET 127.0.0.1:9200/student_index/_doc/21001

如果找到了：

{
    "_index": "student_index",
    "_type": "student",
    "_id": "21001",
    "_version": 1,
    "_seq_no": 5,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "id": 21001,
        "name": "robin-fox",
        "score": 99.1
    }
}

找不到

{
    "_index": "student_index",
    "_type": "student",
    "_id": "21001",
    "found": false
}

3.4.2使用查询体

POST 127.0.0.1:9200/student_index/student/_search

如果没有type可以直接

GET http://127.0.0.1:9200/student_index/_search?pretty

大多数时候我们并不知道文档的具体ID,或者我想一次性获取多个文档,我们可以使用查询体去查询（查询体语法详见es官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html）;

{
    "query":{
        "match_all":{} 
    },
    "size": 9999,
    "from": 1
}

size与from

size是每次查询的文档数量默认为10条，一次最多查询10000条
from为从第几条开始查询,默认为0
比如我想要查询9999条数据,从第1条开始查询

按照名字来查找，这里是模糊查找

{
    "query": {
    "match": {
      "robin": "robin-fox"
    }
  }
}

3.4.3 复合查询

复合查询有：

bool query (布尔查询)
boosting query (提高查询)
constant_score（固定分数查询）
dis_max (最佳匹配查询）`
function_score (函数查询）

参考：简书文章 https://www.jianshu.com/p/c451d89bc8bf

3.4.3.1 bool query

Bool查询语法有以下特点

子查询可以任意顺序出现
可以嵌套多个查询，包括bool查询
如果bool查询中没有must条件，should中必须至少满足一条才会返回结果。

bool查询包含四种操作符，分别是must,should,must_not,filter。他们均是一种数组，数组里面是对应的判断条件。

{
    "bool": {
        "must": [],
        "should": [],
        "must_not": [],
        "filter": {}
    }
}

must：

文档必须匹配这些条件才能被包含进来。

must_not：

文档必须不匹配这些条件才能被包含进来。

should：

如果满足这些语句中的任意语句，将增加 _score，否则，无任何影响。它们主要用于修正每个文档的相关性得分。

filter：

必须 匹配，但它以不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。

官方示例

{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      },
      "filter": {
        "term" : { "tag" : "tech" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tag" : "wow" } },
        { "term" : { "tag" : "elasticsearch" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

在filter元素下指定的查询对评分没有影响 , 评分返回为0。分数仅受已指定查询的影响。

官方例子
GET _search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "status": "active"
        }
      }
    }
  }
}

3.4.3.2 boosting query

为了避免上述的简单粗暴的过滤机制，使用评分机制；

3.4.3.3 constant_score

定义 常量分值查询，目的就是返回指定的score，一般都结合filter使用，因为filter context忽略score

举例
（结果 1->2->3 同时分数都为2.5）
POST news/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "match": {
         "content":"apple"
        }
      },
      "boost": 2.5
    }
  }
}

3.4.3.4 dis_max

dis_max : 只是取分数最高的那个query的分数而已。

看下官方例子

GET /_search
{
    "query": {
        "dis_max" : {
            "queries" : [
                { "term" : { "title" : "Quick pets" }},
                { "term" : { "body" : "Quick pets" }}
            ],
            "tie_breaker" : 0.7
        }
    }
}

假设一条文档的’title’查询得分是 1，'body’查询得分是1.6。那么总得分为：1.6+1*0.7 = 2.3。

如果我们去掉"tie_breaker" : 0.7 ，那么tie_breaker默认为0，那么这条文档的得分就是 1.6 + 1*0 = 1.6

3.4.3.5 function_score

主要用户复杂场景，但是更贴近人性的一些查询，使用评分机制，比如找合适的酒店；

官方讲解：

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

3.5 统计文档个数

在使用ES搜索的时候，或多或少都会面临查询数据总量的情况，下面介绍三种查询数据总量的方式。

其中，方案二解决了当结果数据总量超过1w时，由于ES默认设置（max_result_window:10000，出于性能问题考虑，用户也不想放开这个限制），只能返回命中数等于1w的问题。

3.5.1 方案一

查询全部索引下的文档总数：

GET http://127.0.0.1:9200/_cat/count?v=true

查询某个索引下的文档总数（最后部分为索引名）：

GET http://127.0.0.1:9200/_cat/count/student_index?v=true

返回字符串：

epoch      timestamp count
1648609772 03:09:32  1

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-count.html

3.5.2 方案二

将 track_total_hits" 属性设置为 true（为索引名）

GET http://127.0.0.1:9200/<target>/_search


{
  "track_total_hits": true,
  "query": {
    "match_all" : {
    }
  }
}

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html

3.53 方案三（不推荐）

此命令是查询索引状态的，当然也包含了每个索引下的文档数量。但是在部分情况下，是不够准确的，因为这个数量包含了隐藏的嵌套文档。参考官方文档的解释：
These metrics are retrieved directly from Lucene, which Elasticsearch uses internally to power indexing and search. As a result, all document counts include hidden nested documents.
To get an accurate count of Elasticsearch documents, use the cat count or count APIs.

GET /_cat/indices
GET /_cat/indices/<target>

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.html

4. 数据库管理

4.1 集群状态

我们通常用用_cat API检测集群是否健康。确保9200端口号可用:

GET localhost:9200/_cat/health?v

绿色表示一切正常, 黄色表示所有的数据可用但是部分副本还没有分配,红色表示部分数据因为某些原因不可用.

epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1648619775 05:56:15  my-es   yellow          1         1     95  95    0    0       58             0                  -                 62.1%

4.2 集群节点列表

localhost:9200/_cat/nodes?v

ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.128.6.66           21          30   4                          mdi       *      node-1

4.3 索引列表

localhost:9200/_cat/indices?v

这个比较长，不展示结果了；

索引的状态

GET 127.0.0.1:9200/student_index/_stats

GET 127.0.0.1:9200/student_index/_mappings

所有的映射

http://localhost:9200/_all/_mapping?pretty

删除所有的索引

DELETE 127.0.0.1:9200/_all

飞鸟真人

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录