Elasticsearch 7.x版本重大改变（Breaking changes in 7.x）

最新推荐文章于 2024-04-13 17:43:13 发布

俺是刘铁柱

最新推荐文章于 2024-04-13 17:43:13 发布

阅读量6.3k

点赞数 1

分类专栏： Elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/weixin_43249121/article/details/108641958

版权

Elasticsearch 7.x版本引入了许多重大变更，包括Aggregations、Cluster、Discovery、Indices、API、Mapping、ML等方面的改进和调整。如术语聚合的执行提示已删除，集群设置中的最大桶数限制，默认设置变化，以及安全、搜索、索引、脚本等多方面的增强和不兼容更新。升级到7.x时，需注意重新索引旧版本索引，更新脚本和配置，以及处理各种弃用的选项和功能。

摘要由CSDN通过智能技术生成

Breaking changes in 7.0

This section discusses the changes that you need to be aware of when migrating your application to Elasticsearch 7.0.
本部分讨论将应用程序迁移到Elasticsearch 7.0时需要注意的更改。

Indices created before 7.0

Elasticsearch 7.0 can read indices created in version 6.0 or above. An Elasticsearch 7.0 node will not start in the presence of indices created in a version of Elasticsearch before 6.0
Elasticsearch 7.0可以读取在6.0或更高版本中创建的索引。如果存在在6.0之前的版本中创建的索引，则Elasticsearch 7.0节点将不会启动。

IMPORTANT

Reindex indices from Elasticsearch 5.x or before Indices created in Elasticsearch 5.x or before will need to be reindexed with Elasticsearch 6.x in order to be readable by Elasticsearch 7.x.
从Elasticsearch 5.x或更早版本重新索引索引在Elasticsearch 5.x或更早版本中创建的索引将需要用Elasticsearch 6.x重新索引，以便Elasticsearch 7.x可以读取。

1.Aggregations changes

Deprecated `global_ordinals_hash` and `global_ordinals_low_cardinality` execution hints for terms aggregations have been removed

These execution_hint are removed and should be replaced by global_ordinals.

术语聚合已弃用的global_ordinals_hash和global_ordinals_low_cardinality执行提示已删除。
这些execution_hint被删除，应由global_ordinals代替

`search.max_buckets` in the cluster setting

The dynamic cluster setting named search.max_buckets now defaults to 10,000 (instead of unlimited in the previous version). Requests that try to return more than the limit will fail with an exception.

集群设置中的search.max_buckets
现在，名为search.max_buckets的动态集群设置默认为10,000（而不是先前版本中的无限制）。尝试返回超过限制的请求将失败，并发生异常。

`missing` option of the `composite` aggregation has been removed

The missing option of the composite aggregation, deprecated in 6.x, has been removed. missing_bucket should be used instead.

复合聚合的缺少选项已被删除
6.x中不推荐使用的复合聚合缺少的选项已被删除。应该使用missing_bucket代替。

Replaced `params._agg` with `state` context variable in scripted metric aggregations

The object used to share aggregation state between the scripts in a Scripted Metric Aggregation is now a variable called state available in the script context, rather than being provided via the params object as params._agg.

在脚本化的指标聚合中用状态上下文变量替换了params._agg
现在，用于在脚本化度量标准聚合中的脚本之间共享聚合状态的对象是在脚本上下文中可用的称为状态的变量，而不是通过params对象以params._agg的形式提供。

Make metric aggregation script parameters `reduce_script` and `combine_script` mandatory

The metric aggregation has been changed to require these two script parameters to ensure users are explicitly defining how their data is processed.

使度量标准聚合脚本参数reduce_script和Combine_script为必需
度量标准聚合已更改为需要这两个脚本参数，以确保用户明确定义如何处理其数据。

`percentiles` and `percentile_ranks` now return `null` instead of `NaN`

The percentiles and percentile_ranks aggregations used to return NaN in the response if they were applied to an empty set of values. Because NaN is not officially supported by JSON, it has been replaced with null.

percentiles和percentile_ranks现在返回null而不是NaN
如果将百分位数和percentile_ranks聚合应用于空值集，则它们将在响应中返回NaN。由于JSON尚未正式支持NaN，因此已将其替换为null。

`stats` and `extended_stats` now return 0 instead of `null` for zero docs

When the stats and extended_stats aggregations collected zero docs (doc_count: 0), their value would be null. This was in contrast with the sum aggregation which would return 0. The stats and extended_stats aggs are now consistent with sum and also return zero.

现在，stats和extended_stats对于零个文档返回0而不是null
当stats和extended_stats聚合收集了零个文档（doc_count：0）时，其值将为null。这与求和聚合将返回0相反。stats和extended_stats aggs现在与sum一致，并且也返回零。

2.Cluster changes

`:` is no longer allowed in cluster name

Due to cross-cluster search using : to separate a cluster and index name, cluster names may no longer contain :.

是不在允许集群名字
由于使用：进行跨集群搜索以分隔集群和索引名称，因此集群名称可能不再包含：

New default for `wait_for_active_shards` parameter of the open index command

The default value for the wait_for_active_shards parameter of the open index API is changed from 0 to 1, which means that the command will now by default wait for all primary shards of the opened index to be allocated.

打开索引命令的wait_for_active_shards参数的新默认值
开放索引API的wait_for_active_shards参数的默认值从0更改为1，这意味着该命令现在默认情况下将等待分配开放索引的所有主分片。

Shard preferences `_primary`, `_primary_first`, `_replica`, and `_replica_first` are removed

These shard preferences are removed in favour of the _prefer_nodes and _only_nodes preferences.

碎片首选项_primary，_primary_first，_replica和_replica_first被删除
删除这些分片首选项，以支持_prefer_nodes和_only_nodes首选项。

Cluster-wide shard soft limit

Clusters now have soft limits on the total number of open shards in the cluster based on the number of nodes and the cluster.max_shards_per_node cluster setting, to prevent accidental operations that would destabilize the cluster. More information can be found in the documentation for that setting.

群集范围的分片软限制
现在，群集根据节点数和cluster.max_shards_per_node群集设置对群集中打开的分片总数进行了软限制，以防止意外操作会使群集不稳定。在该设置的文档中可以找到更多信息。

3.Discovery changes

Cluster bootstrapping is required if discovery is configured

The first time a cluster is started, cluster.initial_master_nodes must be set to perform cluster bootstrapping. It should contain the names of the master-eligible nodes in the initial cluster and be defined on every master-eligible node in the cluster. See the discovery settings summary for an example, and the cluster bootstrapping reference documentation describes this setting in more detail.

The discovery.zen.minimum_master_nodes setting is permitted, but ignored, on 7.x nodes.

如果配置了发现，则需要集群引导
首次启动集群时，必须将cluster.initial_master_nodes设置为执行集群引导。它应包含初始群集中符合主机要求的节点的名称，并应在群集中的每个符合主机要求的节点上进行定义。有关示例，请参见发现设置摘要。集群自举参考文档更详细地描述了此设置。

在7.x节点上，discovery.zen.minimum_master_nodes设置是允许的，但被忽略。

Removing master-eligible nodes sometimes requires voting exclusions

If you wish to remove half or more of the master-eligible nodes from a cluster, you must first exclude the affected nodes from the voting configuration using the voting config exclusions API. If you remove fewer than half of the master-eligible nodes at the same time, voting exclusions are not required. If you remove only master-ineligible nodes such as data-only nodes or coordinating-only nodes, voting exclusions are not required. Likewise, if you add nodes to the cluster, voting exclusions are not required.

删除符合主机资格的节点有时需要排除投票
如果您希望从集群中删除一半或更多的符合Master要求的节点，则必须首先使用投票配置排除API将受影响的节点从投票配置中排除。如果您同时删除少于一半的符合主机资格的节点，则不需要投票排除。如果仅删除不符合主机要求的节点（例如仅数据节点或仅协调节点），则不需要投票排除。同样，如果将节点添加到群集，则不需要投票排除。

Discovery configuration is required in production

Production deployments of Elasticsearch now require at least one of the following settings to be specified in the elasticsearch.yml configuration file:

discovery.seed_hosts
discovery.seed_providers
cluster.initial_master_nodes
discovery.zen.ping.unicast.hosts
discovery.zen.hosts_provider

The first three settings in this list are only available in versions 7.0 and above. If you are preparing to upgrade from an earlier version, you must set discovery.zen.ping.unicast.hosts or discovery.zen.hosts_provider.

生产中需要发现配置
现在，Elasticsearch的生产部署需要至少在Elasticsearch.yml配置文件中指定以下设置之一：

Discovery.seed_hosts
Discovery.seed_providers
cluster.initial_master_nodes
Discovery.zen.ping.unicast.hosts
Discovery.zen.hosts_provider
此列表中的前三个设置仅在7.0及更高版本中可用。如果准备从早期版本升级，则必须设置discovery.zen.ping.unicast.hosts或discovery.zen.hosts_provider。

New name for `no_master_block` setting

The discovery.zen.no_master_block setting is now known as cluster.no_master_block. Any value set for discovery.zen.no_master_block is now ignored. You should remove this setting and, if needed, set cluster.no_master_block appropriately after the upgrade.

no_master_block设置的新名称
Discovery.zen.no_master_block设置现在称为cluster.no_master_block。现在将忽略为Discovery.zen.no_master_block设置的任何值。您应该删除此设置，并在需要后适当地设置cluster.no_master_block。

Reduced default timeouts for fault detection

By default the cluster fault detection subsystem now considers a node to be faulty if it fails to respond to 3 consecutive pings, each of which times out after 10 seconds. Thus a node that is unresponsive for longer than 30 seconds is liable to be removed from the cluster. Previously the default timeout for each ping was 30 seconds, so that an unresponsive node might be kept in the cluster for over 90 seconds.

减少故障检测的默认超时
默认情况下，如果群集节点未能响应3个连续的ping（每个ping在10秒后超时），则群集故障检测子系统现在将其视为故障节点。因此，响应时间超过30秒的节点可能会从群集中删除。以前，每个ping的默认超时为30秒，因此无响应的节点可能会在群集中保留90秒以上。

Master-ineligible nodes are ignored by discovery

In earlier versions it was possible to use master-ineligible nodes during the discovery process, either as seed nodes or to transfer discovery gossip indirectly between the master-eligible nodes. Clusters that relied on master-ineligible nodes like this were fragile and unable to automatically recover from some kinds of failure. Discovery now involves only the master-eligible nodes in the cluster so that it is not possible to rely on master-ineligible nodes like this. You should configure discovery.seed_hosts or another seed hosts provider to provide the addresses of all the master-eligible nodes in your cluster.

发现不考虑符合主机资格的节点
在早期版本中，可以在发现过程中使用不符合主机要求的节点作为种子节点，或者在符合条件的主机之间间接传输发现八卦。像这样的依赖主机资格的节点的群集非常脆弱，无法自动从某些故障中恢复。现在，发现仅涉及群集中符合主机要求的节点，因此不可能像这样依赖于符合主机要求的节点。您应配置discovery.seed_hosts或其他种子主机提供程序，以提供集群中所有符合主机要求的节点的地址。

4.Indices changes

Index creation no longer defaults to five shards

Previous versions of Elasticsearch defaulted to creating five shards per index. Starting with 7.0.0, the default is now one shard per index.

索引创建不再默认为五个分片
Elasticsearch的早期版本默认为每个索引创建五个分片。从7.0.0开始，默认值现在是每个索引一个分片。

`:` is no longer allowed in index name

Due to cross-cluster search using : to separate a cluster and index name, index names may no longer contain :.

`index.unassigned.node_left.delayed_timeout` may no longer be negative

Negative values were interpreted as zero in earlier versions but are no longer accepted.

index.unassigned.node_left.delayed_timeout可能不再为负
负值在早期版本中被解释为零，但不再被接受。

`_flush` and `_force_merge` will no longer refresh

In previous versions issuing a _flush or _force_merge (with flush=true) had the undocumented side-effect of refreshing the index which made new documents visible to searches and non-realtime GET operations. From now on these operations don’t have this side-effect anymore. To make documents visible an explicit _refresh call is needed unless the index is refreshed by the internal scheduler.

_flush和_force_merge将不再刷新
在以前的版本中，发出_flush或_force_merge（带有flush = true）具有刷新索引的未记录的副作用，该索引使新文档对搜索和非实时GET操作可见。从现在开始，这些操作不再具有这种副作用。为了使文档可见，除非内部调度程序刷新了索引，否则需要显式的_refresh调用。

Limit to the difference between max_size and min_size in NGramTokenFilter and NGramTokenizer

To safeguard against creating too many index terms, the difference between max_ngram and min_ngram in NGramTokenFilter and NGramTokenizer has been limited to 1. This default limit can be changed with the index setting index.max_ngram_diff. Note that if the limit is exceeded a error is thrown only for new indices. For existing pre-7.0 indices, a deprecation warning is logged.

限制NGramTokenFilter和NGramTokenizer中的max_size和min_size之差
为了防止创建过多的索引项，NGramTokenFilter和NGramTokenizer中的max_ngram和min_ngram之间的差异已限制为1。可以使用索引设置index.max_ngram_diff更改此默认限制。请注意，如果超出限制，则仅对新索引抛出错误。对于现有的7.0之前的索引，将记录弃用警告。

Document distribution changes

Indices created with version 7.0.0 onwards will have an automatic index.number_of_routing_shards value set. This might change how documents are distributed across shards depending on how many shards the index has. In order to maintain the exact same distribution as a pre 7.0.0 index, the index.number_of_routing_shards must be set to the index.number_of_shards at index creation time. Note: if the number of routing shards equals the number of shards _split operations are not supported.

文件分配变更
从7.0.0版开始创建的索引将设置自动index.number_of_routing_shards值。这可能会改变文档在各个分片上的分配方式，具体取决于索引所包含的分片数。为了保持与7.0.0之前的索引完全相同的分布，必须在创建索引时将index.number_of_routing_shards设置为index.number_of_shards。注意：如果路由分片的数量等于分片的数量，则不支持_split操作。

Skipped background refresh on search idle shards

Shards belonging to an index that does not have an explicit index.refresh_interval configured will no longer refresh in the background once the shard becomes “search idle”, ie the shard hasn’t seen any search traffic for index.search.idle.after seconds (defaults to 30s). Searches that access a search idle shard will be “parked” until the next refresh happens. Indexing requests with wait_for_refresh will also trigger a background refresh.

搜索闲置碎片上的跳过背景刷新
一旦分片变为“搜索空闲”，即属于该分片的索引（未配置显式index.refresh_interval）将不再在后台刷新，即，几秒钟后该分片没有看到任何针对index.search.idle。的搜索流量。（默认为30秒）。访问搜索空闲分片的搜索将被“停放”，直到下一次刷新发生为止。使用wait_for_refresh建立索引请求还将触发后台刷新。

Remove deprecated url parameters for Clear Indices Cache API

The following previously deprecated url parameter have been removed:

filter - use query instead
filter_cache - use query instead
request_cache - use request instead
field_data - use fielddata instead

删除Clear Indices Cache API弃用的url参数
以下先前不推荐使用的url参数已删除：

过滤器-使用查询代替
filter_cache-使用查询代替
request_cache-改用request
field_data-改用fielddata

network.breaker.inflight_requests.overhead` increased to 2

Previously the in flight requests circuit breaker considered only the raw byte representation. By bumping the value of network.breaker.inflight_requests.overhead from 1 to 2, this circuit breaker considers now also the memory overhead of representing the request as a structured object.

network.breaker.inflight_requests.overhead增加到2
以前，飞行中的请求断路器仅考虑原始字节表示。通过将network.breaker.inflight_requests.overhead的值从1增加到2，此断路器现在还考虑了将请求表示为结构化对象的内存开销。

Parent circuit breaker changes

The parent circuit breaker defines a new setting indices.breaker.total.use_real_memory which is true by default. This means that the parent circuit breaker will trip based on currently used heap memory instead of only considering the reserved memory by child circuit breakers. When this setting is true, the default parent breaker limit also changes from 70% to 95% of the JVM heap size. The previous behavior can be restored by setting indices.breaker.total.use_real_memory to false.

父断路器的更改
父级断路器定义了一个新的设置index.breaker.total.use_real_memory，默认情况下为true。这意味着父断路器将基于当前使用的堆内存而跳闸，而不是仅由子断路器考虑保留的内存。当此设置为true时，默认的父断路器限制也将从JVM堆大小的70％更改为95％。可以通过将index.breaker.total.use_real_memory设置为false来恢复以前的行为。

Field data circuit breaker changes

As doc values have been enabled by default in earlier versions of Elasticsearch, there is less need for fielddata. Therefore, the default value of the setting indices.breaker.fielddata.limit has been lowered from 60% to 40% of the JVM heap size.

现场数据断路器的变化
由于在早期版本的Elasticsearch中默认启用了doc值，因此对字段数据的需求减少了。因此，设置索引index.breaker.fielddata.limit的默认值已从JVM堆大小的60％降低到40％。

`fix` value for `index.shard.check_on_startup` is removed

Deprecated option value fix for setting index.shard.check_on_startup is not supported.

index.shard.check_on_startup的固定值已删除
不支持用于设置index.shard.check_on_startup的选项值修复。

`elasticsearch-translog` is removed

Use the elasticsearch-shard tool to remove corrupted translog data.

elasticsearch-translog被删除
使用elasticsearch-shard工具删除损坏的转记录数据。

5.API changes

Ingest configuration exception information is now transmitted in metadata field

Previously, some ingest configuration exception information about ingest processors was sent to the client in the HTTP headers, which is inconsistent with how exceptions are conveyed in other parts of Elasticsearch.

Configuration exception information is now conveyed as a field in the response body.

现在，在元数据字段中传输摄取配置异常信息
以前，有关摄取处理器的一些摄取配置异常信息是通过HTTP标头发送到客户端的，这与Elasticsearch其他部分中传达异常的方式不一致。

现在，配置异常信息在响应正文中作为字段传达。

Ingest plugin special handling has been removed

There was some special handling for installing and removing the ingest-geoip and ingest-user-agent plugins after they were converted to modules. This special handling was done to minimize breaking users in a minor release, and would exit with a status code zero to avoid breaking automation.

This special handling has now been removed.

提取插件的特殊处理已删除
在将ingest-geoip和ingest-user-agent插件转换为模块后，需要进行一些特殊的安装和删除操作。进行此特殊处理是为了最大程度地减少次要版本中破坏用户的情况，并以状态代码零退出以避免破坏自动化。

此特殊处理现已删除。

6.Mapping changes

The `_all` meta field is removed

The _all field deprecated in 6 have now been removed.

_all元字段已删除
现在已删除在6中弃用的_all字段。

The `_uid` meta field is removed

This field used to index a composite key formed of the _type and the _id. Now that indices cannot have multiple types, this has been removed in favour of _id.

_uid元字段已删除
该字段用于索引由_type和_id组成的复合键。现在索引不能具有多个类型，因此已删除它，而使用_id。

The `_default_` mapping is no longer allowed

The _default_ mapping has been deprecated in 6.0 and is now no longer allowed in 7.0. Trying to configure a _default_ mapping on 7.x indices will result in an error.

不再允许_default_映射
_default_映射在6.0中已弃用，现在在7.0中不再允许。尝试在7.x索引上配置_default_映射将导致错误。

`index_options` for numeric fields has been removed

The index_options field for numeric fields has been deprecated in 6 and has now been removed.

数字字段的index_options已删除
数字字段的index_options字段已在6中弃用，现已删除。

Limiting the number of `nested` json objects

To safeguard against out of memory errors, the number of nested json objects within a single document across all fields has been limited to 10000. This default limit can be changed with the index setting index.mapping.nested_objects.limit.

限制嵌套json对象的数量
为了防止出现内存不足错误，单个文档中所有字段中嵌套json对象的数量限制为10000。可以使用索引设置index.mapping.nested_objects.limit更改此默认限制。

The `update_all_types` option has been removed

This option is useless now that all indices have at most one type.

update_all_types选项已被删除
由于所有索引最多具有一种类型，因此此选项无用。

The `classic` similarity has been removed

The classic similarity relied on coordination factors for scoring to be good in presence of stopwords in the query. This feature has been removed from Lucene, which means that the classic similarity now produces scores of lower quality. It is advised to switch to BM25 instead, which is widely accepted as a better alternative.

经典相似之处已删除
经典相似性依赖于协调因子，以便在查询中存在停用词时评分良好。此功能已从Lucene中删除，这意味着经典的相似性现在会产生较低质量的分数。建议改用BM25，它已被广泛认为是更好的选择。

Similarities fail when unsupported options are provided

An error will now be thrown when unknown configuration options are provided to similarities. Such unknown parameters were ignored before.

提供不支持的选项时相似性失败
如果为相似性提供未知的配置选项，现在将引发错误。这样的未知参数之前被忽略。

Changed default `geo_shape` indexing strategy

geo_shape types now default to using a vector indexing approach based on Lucene’s new LatLonShape field type. This indexes shapes as a triangular mesh instead of decomposing them into individual grid cells. To index using legacy prefix trees the tree parameter must be explicitly set to one of quadtree or geohash. Note that these strategies are now deprecated and will be removed in a future version.

IMPORTANT NOTE: If using timed index creation from templates, the geo_shape mapping should also be changed in the template to explicitly define tree to one of geohash or quadtree. This will ensure compatibility with previously created indexes.

更改默认的geo_shape索引策略
现在，geo_shape类型默认使用基于Lucene的新LatLonShape字段类型的矢量索引方法。这会将形状索引为三角形网格，而不是将其分解为单独的网格单元。要使用旧式前缀树建立索引，必须将tree参数显式设置为四叉树或geohash之一。请注意，这些策略现在已被弃用，并将在以后的版本中删除。

重要说明：如果使用通过模板创建定时索引，还应该在模板中更改geo_shape映射，以将树明确定义为geohash或quadtree之一。这将确保与先前创建的索引兼容。

Deprecated `geo_shape` parameters

The following type parameters are deprecated for the geo_shape field type: tree, precision, tree_levels, distance_error_pct, points_only, and strategy. They will be removed in a future version.

不推荐使用的geo_shape参数
geo_shape字段类型不建议使用以下类型参数：tree，precision，tree_levels，distance_error_pct，points_only和strategy。它们将在以后的版本中删除。

`include_type_name` now defaults to `false`

The default for include_type_name is now false for all APIs that accept the parameter.

include_type_name现在默认为false
现在，对于所有接受该参数的API，include_type_name的默认值为false。

7.ML changes

Types in Datafeed config are no longer valid

Types have been removed from the datafeed config and are no longer valid parameters.

数据Feed配置中的类型不再有效
类型已从数据Feed配置中删除，并且不再是有效参数。

堆外术语索引
术语词典是倒排索引的一部分，它按排序顺序记录段中出现的所有术语。为了提供快速检索，术语词典带有小的术语索引，该索引允许按术语进行有效的随机访问。到目前为止，此术语索引始终是堆加载的。

从7.0开始，术语索引将仅对具有唯一值的字段（如_id字段）进行堆加载，否则对其他字段（如大多数其他字段）进行堆外加载。预计这将减少内存需求，但如果同时满足以下两个条件，则可能会降低搜索请求的速度：

每个节点上的数据目录的大小明显大于文件系统缓存可用的内存量。
查询的匹配数不比查询尝试匹配的项数大几个数量级，无论是通过一项或多项查询来显式地，还是通过诸如前缀，通配符或模糊查询之类的多项查询来隐式地。
此更改会影响使用Elasticsearch 6.x创建的现有索引和使用Elasticsearch 7.x创建的新索引。

Changes to queries

The default value for transpositions parameter of fuzzy query has been changed to true.
The query_string options use_dismax, split_on_whitespace, all_fields, locale, auto_generate_phrase_query and lowercase_expanded_terms deprecated in 6.x have been removed.
Purely negative queries (only MUST_NOT clauses) now return a score of 0 rather than 1.
The boundary specified using geohashes in the geo_bounding_box query now include entire geohash cell, instead of just geohash center.
Attempts to generate multi-term phrase queries against non-text fields with a custom analyzer will now throw an exception.
An envelope crossing the dateline in a geo_shapequery is now processed correctly when specified using REST API instead of having its left and right corners flipped.
Attempts to set boost on inner span queries will now throw a parsing exception.