ElasticSearch权威指南学习笔记

最新推荐文章于 2024-09-13 10:54:25 发布

松间-明月

最新推荐文章于 2024-09-13 10:54:25 发布

阅读量220

点赞数

分类专栏： ElasticSearch 文章标签： elasticsearch 分布式 java

本文链接：https://blog.csdn.net/weixin_42962086/article/details/107943240

版权

ElasticSearch 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

ElasticSearch权威指南学习笔记

参考链接：

https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html

文章目录

ElasticSearch权威指南学习笔记

说明

学习目标

包括集群如何扩容以及如何进行failover？(Life Inside a Cluster)
ElasticSearch如何实现文档的存储？(Distributed Document Store)
ElasticSearch如何实现分布式搜索？ (Distributed Search Execution)
什么是shard，工作原理是什么？ (Inside a Shard)

Getting Started

You Know, for Search…

Distributed Nature

Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening automatically under the hood:

Partitioning your documents into different containers or shards, which can be stored on a single node or on multiple nodes
Balancing these shards across the nodes in your cluster to spread the indexing and search load
Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure
Routing requests from any node in the cluster to the nodes that hold the data you’re interested in
Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss

ElasticSearch努力屏蔽分布式系统的复杂概念，对于每一个操作，在底层都会执行以下事项：

将你的文档分发到不同的shard上，这些shard可以分布在不同的节点。
将这些shard均衡分布到集群中的节点，以便进行负载均衡
为每个shard创建副本，以防止数据丢失
任意一台节点接收到请求后，会将请求路由到数据所在的节点。
当新添加节点或者有节点掉线，ElasticSearch会无缝实现数据恢复和重新分布。

As you read through this book, you’ll encounter supplemental chapters about the distributed nature of Elasticsearch. These chapters will teach you about how the cluster scales and deals with failover (Life Inside a Cluster), handles document storage (Distributed Document Store), executes distributed search (Distributed Search Execution), and what a shard is and how it works (Inside a Shard).

继续阅读本书，你将会学习到关于ElasticSearch分布式特性的其他底层知识。包括集群如何扩容以及如何进行failover，如何实现文档的存储，如何实现分布式搜索以及shard的定义和工作原理。

Life Inside a Cluster

Distributed Document Store

In this chapter, we dive into those internal, technical details to help you understand how your data is stored in a distributed system.

本章将深入ElasticSearch的底层，分析数据是如何存储到分布式系统的。

Routing a Document to a Shard

When you index a document, it is stored on a single primary shard. How does Elasticsearch know which shard a document belongs to? When we create a new document, how does it know whether it should store that document on shard 1 or shard 2?

ElasticSearch将document存储在一个主分片中，那么，问题来了，ElasticSearch是如何知道这个文档属于哪个分片的呢？当我们创建一个新的document，ElasticSearch是如何判断这个文档到底是该放到shard1还是shard2呢？这是本节要解决的问题。

The process can’t be random, since we may need to retrieve the document in the future. In fact, it is determined by a simple formula：

shard = hash(routing) % number_of_primary_shards

The routing value is an arbitrary string, which defaults to the document’s _id but can also be set to a custom value. This routing string is passed through a hashing function to generate a number, which is divided by the number of primary shards in the index to return the remainder. The remainder will always be in the range 0 to number_of_primary_shards - 1, and gives us the number of the shard where a particular document lives.

由于我们需要获取到写入的数据，因此这个过程不能是随机的，实际上，ElasticSearch将会按照以下公式计算文档的所属分片：

shard = hash(routing) % number_of_primary_shards

其中的routing字段可以是任意的字符串，默认用ElasticSearch的_id字段，也可以由用户指定。将route的值传给一个hash函数，可以计算得到一个数值，然后用得到的数值余上主分片个数，余数的范围是0~主分片数-1，余数的值就是这个文档所属的分片号。

This explains why the number of primary shards can be set only when an index is created and never changed: if the number of primary shards ever changed in the future, all previous routing values would be invalid and documents would never be found.

Users sometimes think that having a fixed number of primary shards makes it difficult to scale out an index later. In reality, there are techniques that make it easy to scale out as and when you need. We talk more about these in Designing for Scale.

以上文档分配的过程解释了为什么ElasticSearch的主分片必须在index创建的时候指定，并且无法修改。因为一旦主分片个数发生改变，上述公式计算的shard值就会改变，从而导致无法找到数据。

用户可能经常会认为主分片无法改变会限制ElasticSearch索引的扩展能力。事实上，ElasticSearch提供了很多的技术来使得索引可以轻易扩展。这些内容会在后续的Designing for Scale章节介绍。

All document APIs (get, index, delete, bulk, update, and mget) accept a routing parameter that can be used to customize the document-to- shard mapping. A custom routing value could be used to ensure that all related documents—for instance, all the documents belonging to the same user—are stored on the same shard. We discuss in detail why you may want to do this inDesigning for Scale.

ElasticSearch所有的API都支持传入自定义routing参数（包括get，index，delete，bulk，update，mget），用于定义文档-shard的对应关系。通过自定义routing，可以实现将相关主题的document都放到同一个shard上，比如同一个用户的相关记录。这样做的好处将会在后面的Designing for Scale章节介绍。