ElasticSearch42：初识搜索引擎_揭秘如何将一个field索引两次来解决字符串排序问题

最新推荐文章于 2020-02-23 21:41:54 发布

一枚程序员

最新推荐文章于 2020-02-23 21:41:54 发布

阅读量364

点赞数

分类专栏： ElasticSearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/m0_37557582/article/details/78983018

版权

ElasticSearch 专栏收录该内容

60 篇文章 1 订阅

订阅专栏

1.字符串排序有什么问题？
如果对一个string field进行排序，结果往往不准确，因为分词后是多个单词，再排序就不是我们想要的结果了。
通常解决方案是：将一个string field建立两次索引，一个分词，用来搜索，一个部分次，用来进行排序

例子：
GET /website/article/_search
{
"query": {
    "match_all": {}
},
"sort": [
    {
      "title": "asc"
    }
]
}

报错，因为没有正排索引，这里不讲解，后面进行讲解
{
"error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "website",
        "node": "5JcZFTo8TMGAcBR5psWKmg",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
    }
},
"status": 400
}

我们来说一下如何解决排序的问题：
1）首先删除索引

DELETE /website

2）重建索引
注意，"fielddata": true必须要，需要构建正排索引，否则无法对其进行排序操作
增加一个不进行分词的排序字段：
"fields": {
            "raw":{
              "type":"string",
              "index":"not_analyzed"
            }

完整命令

PUT /website
{
  "mappings": {
    "article":{
      "properties": {
        "title":{
          "type": "text",
          "fields": {
            "raw":{
              "type":"string",
              "index":"not_analyzed"
            }
          },
          "fielddata": true
        },
        "content":{
          "type":"text"
        },
        "post_date":{
          "type":"date"
        },
        "author_id":{
          "type":"long"
        }
      }
    }
  }
}

执行结果：
{
"acknowledged": true,
"shards_acknowledged": true
}

准备数据：

PUT /website/article/1
{
  "title":"second article",
  "content":"this is my second article",
  "post_date":"2017-01-01",
  "author_id":100
}
PUT /website/article/2
{
  "title":"first article",
  "content":"this is my first article",
  "post_date":"2017-02-01",
  "author_id":100
}
PUT /website/article/3
{
  "title":"third article",
  "content":"this is my third article",
  "post_date":"2017-03-01",
  "author_id":100
}

查询一下：

{
  "took": 120,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-02-01",
          "author_id": 100
        }
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-01-01",
          "author_id": 100
        }
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 100
        }
      }
    ]
  }
}

执行普通的排序：

GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title": {
        "order": "desc"
      }
    }
  ]
}

执行结果：
可以看到每一个hits中的sort，都会显示排序的实际词，默认情况下都是经过字符串分词后取一个词出来进行排序

        "sort": [
          "third"
        ]

执行结果：

{
  "took": 1304,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 100
        },
        "sort": [
          "third"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-01-01",
          "author_id": 100
        },
        "sort": [
          "second"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-02-01",
          "author_id": 100
        },
        "sort": [
          "first"
        ]
      }
    ]
  }
}

我们可以指定raw作为排序，自行指定排序raw是title索引出来的一个不进行分词的field

GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title.raw": {
        "order": "desc"
      }
    }
  ]
}

那么，可以看到分词的是整个title的内容

{
  "took": 22,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 100
        },
        "sort": [
          "third article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-01-01",
          "author_id": 100
        },
        "sort": [
          "second article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-02-01",
          "author_id": 100
        },
        "sort": [
          "first article"
        ]
      }
    ]
  }
}

一枚程序员

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch42：初识搜索引擎_揭秘如何将一个field索引两次来解决字符串排序问题

1.字符串排序有什么问题？如果对一个string field进行排序，结果往往不准确，因为分词后是多个单词，再排序就不是我们想要的结果了。通常解决方案是：将一个string field建立两次索引，一个分词，用来搜索，一个部分次，用来进行排序例子：GET /website/article/_search{ "query": { "match_all": {}
复制链接

扫一扫