1.字符串排序有什么问题?
如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。
通常解决方案是:将一个string field建立两次索引,一个分词,用来搜索,一个部分次,用来进行排序
例子:
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": "asc"
}
]
}
报错,因为没有正排索引,这里不讲解,后面进行讲解
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "website",
"node": "5JcZFTo8TMGAcBR5psWKmg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}
我们来说一下如何解决排序的问题:
1)首先删除索引
DELETE /website
2)重建索引
注意,"fielddata": true必须要,需要构建正排索引,否则无法对其进行排序操作
增加一个不进行分词的排序字段:
"fields": {
"raw":{
"type":"string",
"index":"not_analyzed"
}
完整命令
执行结果:
{
"acknowledged": true,
"shards_acknowledged": true
}
准备数据:
查询一下:
执行普通的排序:
执行结果:
可以看到每一个hits中的sort,都会显示排序的实际词,默认情况下都是经过字符串分词后取一个词出来进行排序
"sort": [
"third"
]
执行结果:
我们可以指定raw作为排序,自行指定排序raw是title索引出来的一个不进行分词的field
那么,可以看到分词的是整个title的内容
如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。
通常解决方案是:将一个string field建立两次索引,一个分词,用来搜索,一个部分次,用来进行排序
例子:
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": "asc"
}
]
}
报错,因为没有正排索引,这里不讲解,后面进行讲解
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "website",
"node": "5JcZFTo8TMGAcBR5psWKmg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}
我们来说一下如何解决排序的问题:
1)首先删除索引
DELETE /website
2)重建索引
注意,"fielddata": true必须要,需要构建正排索引,否则无法对其进行排序操作
增加一个不进行分词的排序字段:
"fields": {
"raw":{
"type":"string",
"index":"not_analyzed"
}
完整命令
PUT /website
{
"mappings": {
"article":{
"properties": {
"title":{
"type": "text",
"fields": {
"raw":{
"type":"string",
"index":"not_analyzed"
}
},
"fielddata": true
},
"content":{
"type":"text"
},
"post_date":{
"type":"date"
},
"author_id":{
"type":"long"
}
}
}
}
}
执行结果:
{
"acknowledged": true,
"shards_acknowledged": true
}
准备数据:
PUT /website/article/1
{
"title":"second article",
"content":"this is my second article",
"post_date":"2017-01-01",
"author_id":100
}
PUT /website/article/2
{
"title":"first article",
"content":"this is my first article",
"post_date":"2017-02-01",
"author_id":100
}
PUT /website/article/3
{
"title":"third article",
"content":"this is my third article",
"post_date":"2017-03-01",
"author_id":100
}
查询一下:
{
"took": 120,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": 1,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
}
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
}
},
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": 1,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
}
}
]
}
}
执行普通的排序:
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": {
"order": "desc"
}
}
]
}
执行结果:
可以看到每一个hits中的sort,都会显示排序的实际词,默认情况下都是经过字符串分词后取一个词出来进行排序
"sort": [
"third"
]
执行结果:
{
"took": 1304,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
},
"sort": [
"third"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
},
"sort": [
"second"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
},
"sort": [
"first"
]
}
]
}
}
我们可以指定raw作为排序,自行指定排序raw是title索引出来的一个不进行分词的field
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title.raw": {
"order": "desc"
}
}
]
}
那么,可以看到分词的是整个title的内容
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
},
"sort": [
"third article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
},
"sort": [
"second article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
},
"sort": [
"first article"
]
}
]
}
}