五、Elasticsearch 分布式文档系统

最新推荐文章于 2024-07-16 09:50:47 发布

after95

最新推荐文章于 2024-07-16 09:50:47 发布

阅读量134

点赞数

分类专栏： Elasticsearch 文章标签： elasticsearch 分布式

本文链接：https://blog.csdn.net/after95/article/details/105220514

版权

Elasticsearch 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

核心元数据解析

数据示例

GET /test_index/test_type/1
{
    "_index": "test_index",
    "_type": "test_type",
    "_id": "1",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "test_content": "test test"
    }
}

`_index`

代表一个document存放在哪个index中
类似的数据存放在一个索引，非类似的数据存放于不同索引（如用户信息、商品信息建议不要放放在一个index中）
index中包含了很多类似的document
索引名称必须是小写的，不能用下划线开头，不能包含逗号

`_type`

代表document属于index中的那个类别（type）
一个索引通常会划分为多个type，逻辑上对index中有轻微不同的几类数据进行分类（如电子商品、生鲜商品建议不要放在一个type中）
type名称可以使大写或小写，但是不能以下划线开头，不能包含逗号

`_id`

代表document的唯一标识，与index和type一起，可以唯一标识和定位一个document
我们可以手动指定document的id，也可以不指定，由es自动为document创建一个id
- 手动生成ID：
  - a).根据应用情况来说，是否满足手动指定document id 的前提：一般来说数据是从其他系统中导入数据到es中，会采取指定id的方式，就是用系统中已有的数据的唯一标识作为es中document 的 id。如果数据被生成出来没有id，则不适合手动指定id的情况
  - b).
```
PUT /test_index/test_type/1
{
    "test_content": "test test"
}
```
- 自动生成ID：
  - a).长度为20个字符；以GUID方式生成，所以分布式系统并行生成时不可能会发生冲突；因为经过base64编码，所以URL安全，
  - b).
```
POST /test_index/test_type
{
    "test_content": "my content"
}

result:
{
    "_index": "test_index",
    "_type": "test_type",
    "_id": "lz0YG3EBBcJFoD7L9tB6",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 2
}
```

`_source`

创建一个document的时候，使用的那个放在request body 中的json串，默认情况下会全部返回

写入一条数据

PUT /test_index/test_type/1
{
    "test_field1": "test_field1",
    "test_field2": "test_field2"
}

获取一条数据（默认所有的field全部返回）

GET /test_index/test_type/1
result:
{
    "_index": "test_index",
    "_type": "test_type",
    "_id": "1",
    "_version": 2,
    "_seq_no": 2,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "test_field1": "test_field1",
        "test_field2": "test_field2"
    }
}

定制返回的结果（指定_source中返回哪些field，需要多个field用逗号隔开）

GET /test_index/test_type/1?_source=test_field1
result:
{
    "_index": "test_index",
    "_type": "test_type",
    "_id": "1",
    "_version": 2,
    "_seq_no": 2,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "test_field1": "test_field1"
    }
}

document的操作

全量替换

语法与创建document一致，如果document id 不存在，那么就是创建；如果document id 存在，那么就是替换document的json串内容
document是不可变的，如果要修改document的内容，第一种方式就是全量替换，直接对document重新建立索引，替换里面所有的内容
es会将老的document被标记为deleted进行逻辑删除，同时创建一个新的document，并写入新的数据。当es中的数据越来越多的时候，es会自动在后台将标记为deleted的document进行物理删除，已释放空间

强制创建

创建document与全量替换的语法是一样的，有时候我们只是想新建document，而不想替换document
当document id 已经存在的时候会报错
语法
- PUT /index/type/id?op_type=create
- PUT /index/type/id/_create【推荐】

删除

不会进行物理删除，只会将其标记为deleted，当数据越来越多的时候，在后台会自动删除
语法：DELETE /index/type/id

并发冲突

悲观锁

各种情况下都上锁，上锁之后，只有一个线程可以操作这一条数据。不同的场景上不同的锁：行级锁、表级锁、读锁、写锁

乐观锁

每个线程都可以任意操作，但是写的时候会判断当前数据的_seq_no和_primary_term的跟es中的_seq_no和_primary_term是否一致

PUT /test_index/test_type/1?if_seq_no={_seq_no}&if_primary_term={_primary_term}
{
    "test_field": "test test"
}

es 5.x 只要_version不一致则不修改:PUT /test_index/test_type/1?version={_version}

第一次创建document的时候，它的version内部版本号为1
此后每次对这个document进行修改或者删除操作的时候，version版本号都会自动加1
即使是删除也会对document的版本号加1
基于自定义的version控制，语法：PUT /test_index/test_type/1?version={_version}&version_type=external，使用这种语法时传入的_version需要比 es 内部的 _version 大

总结

a). 悲观锁
- 优点：直接加锁方便，对应用程序来说更透明，不需要做额外的操作
- 缺点：并发能力很低，同一时间只能有一个线程操作数据
b).乐观锁
- 优点：并发能力很高，不给数据加锁，适合大量线程并发操作
- 缺点：实现复杂，每次更新的时候，都需要先比对版本号；其次可能需要重复加载多次数据

es的后台，很多的这种类似于replica同步请求都是多线程异步的，也就是说，多个修改请求之间是乱序的，可能后修改的先同步，先修改的后同步
es内部的多线程异步并发修改时，是基于自身的version版本号进行乐观锁并发控制，如果version不想等，则直接放弃当前修改

after95

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
五、Elasticsearch 分布式文档系统

核心元数据解析数据示例GET /test_index/test_type/1{ "_index": "test_index", "_type": "test_type", "_id": "1", "_version": 1, "_seq_no": 0, "_primary_term": 2, "found": true, "...
复制链接

扫一扫