Elastic Search权威指南第一章 ES基础概念

最新推荐文章于 2024-05-15 06:23:30 发布

吴名霄辈

最新推荐文章于 2024-05-15 06:23:30 发布

阅读量355

点赞数

分类专栏： ES学习文章标签：搜索索引

本文链接：https://blog.csdn.net/qq_31238727/article/details/77431049

版权

ES学习专栏收录该内容

0 篇文章 0 订阅

订阅专栏

索引(名词)：一个集群中，可以有多个结点；一个结点中，也可以存在多个索引。一个索引索引着多个分片，所以一个索引索引着的分片，也就可以存在多个结点中。索引其实为一个逻辑命名空间，也就是非真实存在的，只是一个概念。
集群健康
green：主分片都可用，复制分片也都可用
yellow：主分片都可用，复制分片不一定都可用
red：主分片和复制分片都不一定可用
遇到主结点故障时
迅速将一个负结点升级为主结点，主结点中的主分片对应的复制分片，迅速升级为主分片。
文档的元数据
有三个：_index,_type,_id。
每个文档都有一个_version
每当文档发生变化的时候，_version都会加1。
当更新了一个文档时，旧文档并不会立即消失，但是也不能被访问。旧文档会在稍后的索引数据时被清除。
创建新文档
怎样才能保证创建的文档是新的文档，而不是将之前的文档覆盖了呢？
那么就要将当前的index/type/index和索引中已有的index/type/index进行比较，不同的才能进行插入

PUT index/type/id?op_type=create
or
PUT index/type/id/_create

如果不存在，则创建成功，否则，返回409状态码，例如：

{
    "error" : "DocumentAlreadyExistsException[[website][4] [blog][123]:
    document already exists]",
    "status" : 409
}

ES中的CAS
为了避免两次相邻，且包含的更新数据一致的请求，互相覆盖导致数据覆盖，ES中可以根据version进行相应的更新

PUT /website/blog/1?version=2 <1>
{
    "title" : "sssssss",
    "text" : "aaaaaa"
}

<1> 保证只有当索引中的version==2时，才进行相应的更新，否则报错

{
"error" : "VersionConflictEngineException[[website][2] [blog][1]:
version conflict, current [2], provided [1]]",
"status" : 409
}

局部更新
通过update API，update API会自动检查更新期间，版本号是否变更，如果变更，则更新失败

POST /website/blog/1/_update
{
    "doc" : {
        "tags" : ["tag1"],
        "views" : 0
    }
}


key ： value

其中的doc就是局部更新的参数，会将key和value，放入到文档中，key作为域，value作为值
10. 使用脚本进行局部更新
不仅可以通过update API，还可以通过Groovy这种脚本语言进行局部更新

POST /website/blog/1/_update
{
    "script" : "ctx._source.views+=1"
}

语法
{
    "script" : "ctx._source.域 操作"
}

还可以提高脚本的可重用性，增加参数

POST /website/blog/1/_update
{
    "script" : "ctx._source.tags+=new_tag",
    "params" : {
        "new_tag" : "search"
    }
}

语法：
{
    "params":{
        "param_key" : "param_value"
    }
}

这样"search"就会出现在tags域的值域中

setNX
我们会遇到这种情景，如果域不存在，我就新建域，并且初始化。如果存在，就操作
为了满足这种需求，可以使用upsert API

POST /website/blog/1/_upsert
{
    "script" : "ctx._source.views+=1",
    "upsert" : {
        "views" : 1
    }
}

存在 "script" : 
不存在 "upsert" :

ReTry命令
当命令不能执行发生错误时(例如：version发生了更改)

POST /website/blog/1_update?retry_on_conflict=5
{
    "script" : "ctx._source.views+=1",
    "upsert" : {
        "views":0 
    }
}

发生错误后重试5次

检索多个文档
使用mget API
多次索引不如一次索引的效率高，所以建议一次性索引拥有相似index,type的文档

GET /_mget
{
    "docs":[
        {
            "_index" : "website",
            "_type" : "blog",
            "_id" : "1"
        },
        {
            "_index" : "website",
            "_type" : "blog",
            "_id" : "2",
            "_source" :"views"
        }
    ]
}
在docs数组中，包含查询的条件

返回的数据：

{
"docs" : [
        {
            "_index" : "website",
            "_id" : "2",
            "_type" : "blog",
            "found" : true,
            "_source" : {
            "text" : "This is a piece of cake...",
            "title" : "My first external blog entry"
            },
            "_version" : 10
        },
        {
            "_index" : "website",
            "_id" : "1",
            "_type" : "pageviews",
            "found" : true,
            "_version" : 2,
            "_source" : {
            "views" : 2
            }
        }
    ]
}

在命令中指定了index,type，就可以只在docs数组中写id

GET /website/blog/_mget
{
    "docs" :[
        "1","2"
    ]
}

如果检索的文档不存在，则会被告知，但是这次的mget请求仍是成功的，因为mget请求到了数据，至于数据标识为存在还是不存在，就不是mget功能锁管控的范围内了。

假设1不存在
{
"docs" : [
    {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "2",
        "_version" : 10,
        "found" : true,
        "_source" : {
        "title": "My first external blog entry",
        "text": "This is a piece of cake..."
        }
    },
    {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "1",
        "found" : false <1>
    }
    ]
}

批量操作
使用bulk API

语法：
POST /_bulk
{"action" : {"metadata"}}
{"field" : "value" }
{"action" : {"metadata"}}
{"action" : {"metadata"}}

其中的
{"field" : "value"}
不是必选项，一般在使用 create、index命令的时候，才可能涉及到

举个例子：

POST /_bulk
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"thats my first blog"}
{"index":{"_index":"website","_type":"blog","_id":"234"}}
{"title":"thats my second blog"}
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"update":{"_index":"website","_type":"blog","_id":"234","_retry_on_conflict":"3"}}
{"doc":{"title":"i update my second blog"}}

响应结果
有错位，自己对应一下
{
"took": 4,
"errors": false, <1>
"items": [
{ "delete": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 2,
"status": 200,
"found": true
}},
{ "create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 3,
"status": 201
}},
{ "create": {
"_index": "website",
"_type": "blog",
"_id": "EiwfApScQiiy7TIKFxRCTw",
"_version": 1,
"status": 201
}},
{ "update": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 4,
"status": 200
}}
]
}}

注意：bulk请求不是一个原子操作，不能保证事务
15. 批量索引
宗旨：杜绝重复，能够重复利用，坚决不重写

POST /website/blog/_bulk
{"index":{}}
{"title":"thats a new blog"}
{"index":{"_type":log}}      //重用
{"event":"thats not a blog,its a log"}

吴名霄辈

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elastic Search权威指南第一章 ES基础概念

索引(名词)：一个集群中，可以有多个结点；一个结点中，也可以存在多个索引。一个索引索引着多个分片，所以一个索引索引着的分片，也就可以存在多个结点中。索引其实为一个逻辑命名空间，也就是非真实存在的，只是一个概念。集群健康 green：主分片都可用，复制分片也都可用 yellow：主分片都可用，复制分片不一定都可用 red：主分片和复制分片都不一定可用遇到主结点故障时迅速将一个负结点升级为
复制链接

扫一扫