Indices created in Elasticsearch 6.0.0 or later may only contain a single mapping type. Indices created in 5.x with multiple mapping types will continue to function as before in Elasticsearch 6.x. Mapping types will be completely removed in Elasticsearch 7.0.0.
What are mapping types?
Since the first release of Elasticsearch, each document has been stored in a single index and assigned a single mapping type. A mapping type was used to represent the type of document or entity being indexed, for instance a twitter
index might have a user
type and a tweet
type.
Each mapping type could have its own fields, so the user
type might have a full_name
field, a user_name
field, and an email
field, while the tweet
type could have a content
field, a tweeted_at
field and, like the user
type, a user_name
field.
Each document had a _type
meta-field containing the type name, and searches could be limited to one or more types by specifying the type name(s) in the URL:
GET twitter/user,tweet/_search
{
"query": {
"match": {
"user_name": "kimchy"
}
}
}
The _type
field was combined with the document’s _id
to generate a _uid
field, so documents of different types with the same _id
could exist in a single index.
Mapping types were also used to establish a parent-child relationship between documents, so documents of type question
could be parents to documents of type answer
.
Why are mapping types being removed?
Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”.
This was a bad analogy that led to incorrect assumptions. In an SQL database, tables are independent of each other. The columns in one table have no bearing on columns with the same name in another table. This is not the case for fields in a mapping type.
In an Elasticsearch index, fields that have the same name in different mapping types are backed by the same Lucene field internally. In other words, using the example above, the user_name
field in the user
type is stored in exactly the same field as the user_name
field in the tweet
type, and bothuser_name
fields must have the same mapping (definition) in both types.
This can lead to frustration when, for example, you want deleted
to be a date
field in one type and a boolean
field in another type in the same index.
On top of that, storing different entities that have few or no fields in common in the same index leads to sparse data and interferes with Lucene’s ability to compress documents efficiently.
For these reasons, we have decided to remove the concept of mapping types from Elasticsearch.
Alternatives to mapping types
Index per document type
The first alternative is to have an index per document type. Instead of storing tweets and users in a single twitter
index, you could store tweets in the tweets
index and users in the user
index. Indices are completely independent of each other and so there will be no conflict of field types between indices.
This approach has two benefits:
- Data is more likely to be dense and so benefit from compression techniques used in Lucene.
- The term statistics used for scoring in full text search are more likely to be accurate because all documents in the same index represent a single entity.
Each index can be sized appropriately for the number of documents it will contain: you can use a smaller number of primary shards for users
and a larger number of primary shards for tweets
.
包括这个计划在每个版本内会做的改变、如何自己实现类似之前的多type、从多type到单一type的迁移等等。