数据写入
集群
- 客户端选择了一个node发送数据过去,这个node就是coordinating node 协调节点;
- Coordinating node 对document进行路由,将请求发送给对应的node 有primary shard
路由算法: shard_index=hash(id)%number_of_primary_shards
- 实际的node上的primary shard处理请求,然后将数据同步到replica node
- coordinating node,如果发现primary node和所有replica node都搞定之后,就返回响应结果给客户端
节点
- 先写入buffer,在buffer里的时候数据是搜索不到的;同时将数据写入translog日志文件
- 如果buffer快满了,或者到一定时间,就会将buffer数据refresh到一个新的segment file中
es是准实时的 NRT,near real-time - 只要数据进入os cache,此时就可以让这个segment file的数据对外提供搜索了
translog达到一定长度的时候,就会触发commit操作(默认每隔30分钟会自动执行一次commit)
commit操作:1、写commit point;2、将os cache数据fsync强刷到磁盘上去;3、清空translog日志文件
整个commit的过程,叫做flush操作。我们可以手动执行flush操作
segment file会越来越多,此时会定期执行merge
引用算法
PacificA算法
PacificA是微软亚洲研究院提出的一种用于日志复制系统的分布式一致性算法,论文发表于2008年(PacificA paper)。ES官方明确提出了其Replication模型基于该算法:
https://github.com/elastic/elasticsearch/blob/master/docs/reference/docs/data-replication.asciidoc
Elasticsearch’s data replication model is based on the primary-backup model and is described very well in the PacificA paper of Microsoft Research. That model is based on having a single copy from the replication group that acts as the primary shard. The other copies are called replica shards. The primary serves as the main entry point for all indexing operations. It is in charge of validating them and making sure they are correct. Once an index operation has been accepted by the primary, the primary is also responsible for replicating the operation to the other copies.
算法特点:
- 强一致性。
- 单Primary向多Secondary的数据同步模式。
- 使用额外的一致性组件维护Configuration。
- 少数派Replica可用时仍可写入