ES学习和使用笔记之Near Real-Time Search和数据可靠性保证机制

最新推荐文章于 2022-05-12 18:35:11 发布

zhixingheyi_tian

最新推荐文章于 2022-05-12 18:35:11 发布

阅读量859

点赞数

分类专栏：大数据 elasticsearch 文章标签： real-time search

本文链接：https://blog.csdn.net/zhixingheyi_tian/article/details/77865492

版权

大数据同时被 2 个专栏收录

90 篇文章 1 订阅

订阅专栏

elasticsearch

6 篇文章 0 订阅

订阅专栏

Near Real-Time Search
Elasticsearch底层依赖的 Lucene ，引入了 per-segment search 的概念。一个段(segment)是有完整功能的倒排索引。New documents 在被写入an on-disk segment之前，首先写入 in-memory indexing buffer

英文比较浅显，我就不翻译了

Sitting between Elasticsearch and the disk is the filesystem cache. documents in the in-memory indexing buffer are written to a new segment . But the new segment is written to the filesystem cache first—which is cheap—and only later is it flushed to disk—which is expensive. But once a file is in the cache, it can be opened and read, just like any other file.

refresh

In Elasticsearch, this lightweight process of writing and opening a new segment is called a refresh. By default, every shard is refreshed automatically once every second. This is why we say that Elasticsearch has near real-time search: document changes are not visible to search immediately, but will become visible within 1 second.

commit

光是refresh是不够的，还得把data持久化到disk,
the action of performing a commit and truncating the translog is known in Elasticsearch as a flush. Shards are flushed automatically every 30 minutes, or when the translog becomes too big

为了保证数据可靠性，引入了事务日志translog，两次commit point之间，由translog 来纪录data changes
New documents are added to the in-memory buffer and appended to the transaction log
Every so often—such as when the translog is getting too big—the index is flushed; a new translog is created, and a full commit is performed The filesystem cache is flushed with an fsync。The old translog is deleted.
translog本身也是可靠的
By default, the translog is fsync’ed every 5 seconds and after a write request completes (e.g. index, delete, update, bulk). This process occurs on both the primary and replica shards. Ultimately, that means your client won’t receive a 200 OK response until the entire request has been fsync’ed in the translog of the primary and all replicas.

详情可参见章节https://www.elastic.co/guide/en/elasticsearch/guide/current/inside-a-shard.html

zhixingheyi_tian

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ES学习和使用笔记之Near Real-Time Search和数据可靠性保证机制

Near Real-Time Search Elasticsearch底层依赖的 Lucene ，引入了 per-segment search 的概念。一个段(segment)是有完整功能的倒排索引。New documents 在被写入an on-disk segment之前，首先写入 in-memory indexing buffer英文比较浅显，我就不翻译了Sitting between El
复制链接

扫一扫