ES官方说明：Removal of mapping types

最新推荐文章于 2023-01-07 17:10:36 发布

end

最新推荐文章于 2023-01-07 17:10:36 发布

阅读量503

点赞数 2

分类专栏： ES

本文链接：https://blog.csdn.net/endlu/article/details/85230792

版权

ES 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Indices created in Elasticsearch 6.0.0 or later may only contain a single mapping type. Indices created in 5.x with multiple mapping types will continue to function as before in Elasticsearch 6.x. Mapping types will be completely removed in Elasticsearch 7.0.0.

What are mapping types?

Since the first release of Elasticsearch, each document has been stored in a single index and assigned a single mapping type. A mapping type was used to represent the type of document or entity being indexed, for instance a twitter index might have a user type and a tweet type.

Each mapping type could have its own fields, so the user type might have a full_name field, a user_name field, and an email field, while the tweet type could have a content field, a tweeted_atfield and, like the user type, a user_name field.

Each document had a _type meta-field containing the type name, and searches could be limited to one or more types by specifying the type name(s) in the URL:

GET twitter/user,tweet/_search
{
  "query": {
    "match": {
      "user_name": "kimchy"
    }
  }
}

The _type field was combined with the document’s _id to generate a _uid field, so documents of different types with the same _id could exist in a single index.

Mapping types were also used to establish a parent-child relationship between documents, so documents of type question could be parents to documents of type answer.

Why are mapping types being removed?

Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”.

This was a bad analogy that led to incorrect assumptions. In an SQL database, tables are independent of each other. The columns in one table have no bearing on columns with the same name in another table. This is not the case for fields in a mapping type.

In an Elasticsearch index, fields that have the same name in different mapping types are backed by the same Lucene field internally. In other words, using the example above, the user_name field in the user type is stored in exactly the same field as the user_name field in the tweet type, and bothuser_name fields must have the same mapping (definition) in both types.

This can lead to frustration when, for example, you want deleted to be a date field in one type and a boolean field in another type in the same index.

On top of that, storing different entities that have few or no fields in common in the same index leads to sparse data and interferes with Lucene’s ability to compress documents efficiently.

For these reasons, we have decided to remove the concept of mapping types from Elasticsearch.

Alternatives to mapping types

Index per document type

The first alternative is to have an index per document type. Instead of storing tweets and users in a single twitter index, you could store tweets in the tweets index and users in the user index. Indices are completely independent of each other and so there will be no conflict of field types between indices.

This approach has two benefits:

Data is more likely to be dense and so benefit from compression techniques used in Lucene.
The term statistics used for scoring in full text search are more likely to be accurate because all documents in the same index represent a single entity.

Each index can be sized appropriately for the number of documents it will contain: you can use a smaller number of primary shards for users and a larger number of primary shards for tweets.

包括这个计划在每个版本内会做的改变、如何自己实现类似之前的多type、从多type到单一type的迁移等等。