【Elasticsearch系列八】高阶使用

Kwan的解忧杂货铺@新空间代码工作室

于 2024-09-17 08:30:00 发布

阅读量359

点赞数 10

分类专栏： s10 Elasticsearch 文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/qyj19920704/article/details/142300742

版权

s10 Elasticsearch 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

💝💝💝欢迎来到我的博客，很高兴能够在这里和您见面！希望您在这里可以感受到一份轻松愉快的氛围，不仅可以获得有趣的内容和知识，也可以畅所欲言、分享您的想法和见解。

推荐:kwan 的首页,持续学习,不断总结,共同进步,活到老学到老
导航
檀越剑指大厂系列:全面总结 java 核心技术,jvm,并发编程 redis,kafka,Spring,微服务等
常用开发工具系列:常用的开发工具,IDEA,Mac,Alfred,Git,typora 等
数据库系列:详细总结了常用数据库 mysql 技术点,以及工作中遇到的 mysql 问题等
新空间代码工作室:提供各种软件服务,承接各种毕业设计,毕业论文等
懒人运维系列:总结好用的命令,解放双手不香吗?能用一个命令完成绝不用两个操作
数据结构与算法系列:总结数据结构和算法,不同类型针对性训练,提升编程思维,剑指大厂

非常期待和您一起在这个小小的网络世界里共同探索、学习和成长。💝💝💝 ✨✨ 欢迎订阅本专栏 ✨✨

博客目录

1.属性分析

版本号
删除的时候做标记,使用的是懒删除
_id 分为手动 id 和默认 id

{
  "_index": "book",
  "_type": "_doc",
  "_id": "1",
  "_version": 6,
  "_seq_no": 7,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "id": 1,
    "title": "这是一文章",
    "content": "xxxxx",
    "comment": "备注信息",
    "mobile": "13344556677"
  }
}

2._source 字段

含义：插入数据时的所有字段和值。在 get 获取数据时，在 source 字段中原样返回。

GET /book/_doc/1

定制返回:

GET /book/_doc/1?_source_includes=id,title

{
  "_index": "book",
  "_type": "_doc",
  "_id": "1",
  "_version": 6,
  "_seq_no": 7,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "id": 1,
    "title": "这是一文章"
  }
}

3.强制创建

为防止覆盖原有数据，我们在新增时，设置为强制创建，不会覆盖原有文档。

PUT /index/_doc/1/_create

PUT /read_index/_doc/1/_create
{
    "id":1,
    "title":"这是一11文章",
    "content":"xxxxx",
    "comment":"备注信息",
    "mobile":"13344556677"
}

{
  "_index": "read_index",
  "_type": "_doc",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

4.脚本使用

#插入数据
PUT /test_index/_doc/6
{
  "num": 0
}

#执行脚本
POST /test_index/_doc/6/_update
{
  "script": "ctx._source.num+=1"
}

#查询数据
GET /test_index/_doc/6

{
  "_index": "test_index",
  "_type": "_doc",
  "_id": "6",
  "_version": 2,
  "_seq_no": 1,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "num": 1
  }
}

5.查询索引

GET /test_index/_search

{
  "took": 339,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "test_index",
        "_type": "_doc",
        "_id": "6",
        "_score": 1.0,
        "_source": {
          "num": 1
        }
      }
    ]
  }
}

6._version 字段

删除的时候是异步删除,只是做了删除标记,延时删除策略.

es 内部主从同步时，是多线程异步，乐观锁机制。也是基于版本号的

线程 1 先到，线程 2 后到，副本数据是没有问题的
线程 2 先到，线程 1 后到：
- 副本分片先把数据改为 test3,verion=3.
- 线程 1 请求到了，副本分片，判断请求的 verison=1 太旧了，就会丢弃这个请求。

说明:

version 使用数据自带的 version 版本号
_version&version_type=external 则是并发时使用程序自己指定的 version,且是不存在的

PUT /test_index/_doc/4?version=2&version_type=external
{
  "test_field": "itcast1"
}

7.重试次数

指定重试次数

POST /test_index/_doc/5/_update?retry_on_conflict=3
{
  "doc": {
    "test_field": "itcast1"
  }
}

结合 version

POST /test_index/_doc/5/_update?retry_on_conflict=3&version=22&version_type=external
{
  "doc": {
    "test_field": "itcast1"
  }
}

8.批量查询 mget

单条查询 GET /test_index/1，如果查询多个 id 的文档一条一条查询，网络开销太大。

GET /_mget
{
   "docs" : [
      {
         "_index" : "test_index",
         "_type" :  "_doc",
         "_id" :    1
      },
      {
         "_index" : "test_index",
         "_type" :  "_doc",
         "_id" :    7
      }
   ]
}

{
  "docs": [
    {
      "_index": "test_index",
      "_type": "_doc",
      "_id": "2",
      "_version": 6,
      "_seq_no": 12,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "test_field": "test12333123321321"
      }
    },
    {
      "_index": "test_index",
      "_type": "_doc",
      "_id": "3",
      "_version": 6,
      "_seq_no": 18,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "test_field": "test3213"
      }
    }
  ]
}

提示去掉 type

GET /_mget
{
   "docs" : [
      {
         "_index" : "test_index",
         "_id" :    2
      },
      {
         "_index" : "test_index",
         "_id" :    3
      }
   ]
}

同一索引下批量查询：

GET /test_index/_mget
{
   "docs" : [
      {
         "_id" :    2
      },
      {
         "_id" :    3
      }
   ]
}

第三种写法：搜索写法

post /test_index/_doc/_search
{
    "query": {
        "ids" : {
            "values" : ["1", "7"]
        }
    }
}

9.bulk

Bulk 操作解释将文档的增删改查一些列操作，通过一次请求全都做完。减少网络传输次数。

#语法
POST /_bulk
{"action": {"metadata"}}
{"data"}

示例:

#如下操作，删除5，新增14，修改2。
POST /_bulk
{ "create": { "_index": "test_index","_id": "8"}}
{ "test_field": "test8" }
{ "update": { "_index": "test_index","_id": "3"} }
{ "doc": {"test_field": "bulk test"} }
{ "delete": { "_index": "test_index","_id": "5" }}

总结:

功能：
- delete：删除一个文档，只要 1 个 json 串就可以了
- create:相当于强制创建 PUT /index/type/id/_create
- index：普通的 put 操作，可以是创建文档，也可以是全量替换文档
- update：执行的是局部更新 partial update 操作
格式：每个 json 不能换行。相邻 json 必须换行。
隔离：每个操作互不影响。操作失败的行会返回其失败信息。
实际用法：bulk 请求一次不要太大，否则一下积压到内存中，性能会下降。所以，一次请求几千个操作、大小在几 M 正好。

10.定位错误语法

验证错误语句：

GET /book/_validate/query?explain
{
  "query": {
    "mach": {
      "description": "java程序员"
    }
  }
}

{
  "valid": false,
  "error": "org.elasticsearch.common.ParsingException: no [query] registered for [mach]"
}

正确

GET /book/_validate/query?explain
{
  "query": {
    "match": {
      "description": "java程序员"
    }
  }
}

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "book",
      "valid": true,
      "explanation": "description:java description:程序员"
    }
  ]
}

一般用在那种特别复杂庞大的搜索下，比如你一下子写了上百行的搜索，这个时候可以先用 validate api 去验证一下，搜索是否合法。

合法以后，explain 就像 mysql 的执行计划，可以看到搜索的目标等信息。

11.Scroll 分批查询

场景：下载某一个索引中 1 亿条数据，到文件或是数据库。

不能一下全查出来，系统内存溢出。所以使用 scoll 滚动搜索技术，一批一批查询。

scoll 搜索会在第一次搜索的时候，保存一个当时的视图快照，之后只会基于该旧的视图快照提供数据搜索，如果这个期间数据变更，是不会让用户看到的

每次发送 scroll 请求，我们还需要指定一个 scoll 参数，指定一个时间窗口，每次搜索请求只要在这个时间窗口内能完成就可以了。

搜索

GET /book/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "size": 3
}

{
  "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ==",
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": []
  }
}

获得的结果会有一个 scoll_id，下一次再发送 scoll 请求的时候，必须带上这个 scoll_id

GET /_search/scroll
{
    "scroll": "1m",
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ=="
}

与分页区别:

分页是给用户看的 deep paging
scroll 是用户系统内部操作，如下载批量数据，数据转移。零停机改变索引映射。

觉得有用的话点个赞 👍🏻 呗。
❤️❤️❤️本人水平有限，如有纰漏，欢迎各位大佬评论批评指正！😄😄😄

💘💘💘如果觉得这篇文对你有帮助的话，也请给个点赞、收藏下吧，非常感谢!👍 👍 👍

🔥🔥🔥Stay Hungry Stay Foolish 道阻且长,行则将至,让我们一起加油吧！🌙🌙🌙

Kwan的解忧杂货铺@新空间代码工作室

关注

10
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
5
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录