ElasticSearch学习笔记

最新推荐文章于 2024-05-10 06:57:18 发布

silent1

最新推荐文章于 2024-05-10 06:57:18 发布

阅读量4k

点赞数 1

分类专栏： elasticsearch

elasticsearch 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

1 Elasticsearch中的概念与关系型数据库对比。

Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns

关系型数据库数据库表行列
Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields

Els 索引类型文档域（字段）

2 In Elasticsearch, all data in every field isindexed by default.That is, every field has a dedicated inverted index for fast retrieval

在Elasticsearch中，所有的字段缺省都建了索引。也就是说每一个字段都有一个倒排索引，用于快速查询。

3 We can send our requests to any node in the cluster.Every node is fully capable of serving any request. Every node knows the location of every document in the cluster and so can forward requests directly to the required node.

可以把请求发给任意一个节点。每一个节点都有能力响应任何请求。每一个节点都知道每一个文档的位置，因而可以把请求转发给相应的节点。

4 创建、建索引、删除的选项

replication

The default value for replication issync.This causes the primary shard to wait for successful responses from the replica shards before returning.

If you set replication to async, it will return success to the client as soon as the request has been executed on the primary shard. It will still forward the request to the replicas, but you will not know whether the replicas succeeded.

This option is mentioned specifically to advise against using it. The defaultsync replication allows Elasticsearch to exert back pressure on whatever system is feeding it with data. Withasync replication, it is possible to overload Elasticsearch by sending too many requests without waiting for their completion.

复制

复制的缺省值是sync。在这种设定下，原始切片返回前，需要等待副本先返回。

如果把replication设置为async，请求在原始切片上的执行成功后会立即返回成功给客户端。这种设定下，请求仍然会转发给副本，但是客户端不知道副本上是否成功。

此选项特别建议，不要使用它。缺省的同步复制选项可以使Elasticsearch把压力反馈给客户端程序。异步复制的话，有可能使Elasticsearch过载。

consistency

By default, the primary shard requires aquorum, or majority, of shard copies (where a shard copy can be a primary or a replica shard) to be available before even attempting a write operation. This is to prevent writing data to the “wrong side” of a network partition. A quorum is defined as follows:

int( (primary + number_of_replicas) / 2 ) + 1

The allowed values for consistency areone (just the primary shard),all (the primary and all replicas), or the default quorum, or majority, of shard copies.

Note that the number_of_replicas is the number of replicasspecified in the index settings, not the number of replicas that are currently active. If you have specified that an index should have three replicas, a quorum would be as follows:

int( (primary + 3 replicas) / 2 ) + 1 = 3

But if you start only two nodes, there will be insufficient active shard copies to satisfy the quorum, and you will be unable to index or delete any documents.

一致性

缺省的，即使尝试一个写操作，原始切片也需要“法定人数”的拷贝就绪。这是为了防止把数据写到“错误的网络区域”。

timeout

5 It is possible that, while a document is being indexed, the document will already be present on the primary shard but not yet copied to the replica shards. In this case, a replica might report that the document doesn’t exist, while the primary would have returned the document successfully.

有可能出现这种情况，给一个文档建索引的时候，原始切片上已经建好，但副本上还没有建好。这个时候，副本会报告该文档不存在，但原始切片会成功返回该文档。

6 Document-Based Replication

When a primary shard forwards changes to its replica shards,it doesn’t forward the update request. Instead it forwards the new version of the full document. Remember that these changes are forwarded to the replica shards asynchronously, and there is no guarantee that they will arrive in the same order that they were sent. If Elasticsearch forwarded just the change, it is possible that changes would be applied in the wrong order, resulting in a corrupt document.

原始切片不是把变更的部分发给副本，而是把整个变更后的文档发给副本，连同版本号。更新是异步执行的，这个不同于新建索引。...

7 Mutidocument patterns

The patterns for the mget and bulk APIs are similar to those for individual documents. The difference is that the requesting node knows in which shard each document lives. It breaks up the multidocument request into a multidocument requestper shard, and forwards these in parallel to each participating node.

多文档模式

mget和bulk API的模式与单文档类似。所不同的是，请求节点知道每一个文档所在的切片。该节点把多文档请求按照切片进行分割，然后把这些请求转发给相应的节点。

8 Searching one index that has five primary shards is exactly equivalent to searching five indices that have one primary shard each.

搜索1个包含5个原始切片的索引，与搜索5个索引每个包含1个原始切片几乎相同。

9 Customizing Field Mappings

analyzer

For analyzed string fields, use the analyzer attribute to specify which analyzer to apply both at search time and at index time.

定制字段映射

对于需要解析的字符串字段，使用analyzer属性指定建索引和搜索时使用的分词器。也就是说，在Mappings中指定后，建索引、搜索时就不需要再指定了。

10 Updating a Mapping

You can specify the mapping for a type when you first create an index. Alternatively, you can add the mapping for a new type (or update the mapping for an existing type) later, using the/_mapping endpoint.

Although you can add to an existing mapping, you can’tchange it. If a field already exists in the mapping, the data from that field probably has already been indexed. If you were to change the field mapping, the already indexed data would be wrong and would not be properly searchable.

We can update a mapping to add a new field, but we can’t change an existing field fromanalyzed to not_analyzed.

更新字段映射

当第一次生成索引的时候，你可以指定一个字段的映射类型。索引生成后，通过/_mapping endpoint, 你也可以给一个新的字段添加映射（或者更新已有字段的映射）。

虽然你可以增加一个已经存在的字段映射，但是你不能变更它。如果一个字段已经在mapping中了，该字段中有可能已经有索引过的数据了。如果改变该字段的映射，已经索引的数据会损坏，不能正确搜索。

我们可以更新mapping增加一个新的字段，但我们不能把已经存在的字段从analyzed改成not_analyzed.

“上面这段话总体上觉得比较啰嗦，还是有别的意思在里面呢？”

11 Distributed Search Execution

query phase

A coordinating node will round-robin through all shard copies on subsequent requests in order to spread the load

对后续的请求，调度节点会在所有切片上做负载均衡，以分担负载。

这就是副本能够提升查询效率的原因。

fetch phase

The coordinating node builds a multi-get request for each shard that holds a pertinent document and sends the request to the same shard copy that handled the query phase.

get请求会发给当初执行query的切片。

12 Index Management

讲了半天“dynamic mapping”, 也没说到我想知道的。这里所说的dynamic mapping主要说的是，在提前无法预知的情况下，数据里面忽然多出了字段，这时候如何做些设定，让动态的映射更符合要求。其实我想知道的是，给现有索引主动加新的字段的方法。也就是说，我知道要加什么样的字段，这可能是比较多的情况。其实下面的脚本，既可新建，也可增加。（http://stackoverflow.com/questions/25471715/create-or-update-mapping-in-elasticsearch）

curl -XPUT 'http://localhost:9200/advert_index/advert_type/_mapping' -d '
{
    "advert_type" : {
        "properties" : {

          //your new mapping properties

        }
    }
}
'

index alias

这个概念很有用。

Switch transparently between one index and another on a running cluster

Group multiple indices (for example, last_three_months)

Create “views” on a subset of the documents in an index

在运行着的集群上，透明地切换两个索引。（程序使用的是别名，把别名指向另一个索引）

把多个索引组合在一起。（这个应该是指，一个别名指向多个索引）

通过索引文档的子集生成“视图”。（这个是说通过搜索获取子集，然后别名指向该子集？）

silent1

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch学习笔记

1 Elasticsearch中的概念与关系型数据库对比。Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns关系型数据库数据库表行列 Elasticsearch ⇒ Indices ⇒ Typ
复制链接

扫一扫

专栏目录