ElasticSearc 学习4

最新推荐文章于 2024-09-20 21:42:40 发布

Sir.悟空

最新推荐文章于 2024-09-20 21:42:40 发布

阅读量620

点赞数

分类专栏：搜索引擎文章标签： elasticsearch java 大数据

本文链接：https://blog.csdn.net/qq_41451415/article/details/122331004

版权

搜索引擎专栏收录该内容

6 篇文章 0 订阅

订阅专栏

数据写入

集群

客户端选择了一个node发送数据过去，这个node就是coordinating node 协调节点；
Coordinating node 对document进行路由，将请求发送给对应的node 有primary shard

路由算法： shard_index=hash(id)%number_of_primary_shards

实际的node上的primary shard处理请求，然后将数据同步到replica node
coordinating node，如果发现primary node和所有replica node都搞定之后，就返回响应结果给客户端

图4-1
节点

先写入buffer，在buffer里的时候数据是搜索不到的；同时将数据写入translog日志文件
如果buffer快满了，或者到一定时间，就会将buffer数据refresh到一个新的segment file中
es是准实时的 NRT，near real-time
只要数据进入os cache，此时就可以让这个segment file的数据对外提供搜索了

translog达到一定长度的时候，就会触发commit操作（默认每隔30分钟会自动执行一次commit）
commit操作：1、写commit point；2、将os cache数据fsync强刷到磁盘上去；3、清空translog日志文件
整个commit的过程，叫做flush操作。我们可以手动执行flush操作

segment file会越来越多，此时会定期执行merge
在这里插入图片描述
引用算法
PacificA算法
PacificA是微软亚洲研究院提出的一种用于日志复制系统的分布式一致性算法，论文发表于2008年(PacificA paper)。ES官方明确提出了其Replication模型基于该算法：
https://github.com/elastic/elasticsearch/blob/master/docs/reference/docs/data-replication.asciidoc

Elasticsearch’s data replication model is based on the primary-backup model and is described very well in the PacificA paper of Microsoft Research. That model is based on having a single copy from the replication group that acts as the primary shard. The other copies are called replica shards. The primary serves as the main entry point for all indexing operations. It is in charge of validating them and making sure they are correct. Once an index operation has been accepted by the primary, the primary is also responsible for replicating the operation to the other copies.

算法特点：