ElasticSearch42:初识搜索引擎_揭秘如何将一个field索引两次来解决字符串排序问题

1.字符串排序有什么问题?
如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。
通常解决方案是:将一个string field建立两次索引,一个分词,用来搜索,一个部分次,用来进行排序

例子:
GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title": "asc"
    }
  ]
}

报错,因为没有正排索引,这里不讲解,后面进行讲解
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "website",
        "node": "5JcZFTo8TMGAcBR5psWKmg",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
    }
  },
  "status": 400
}




我们来说一下如何解决排序的问题:
1)首先删除索引

DELETE /website

2)重建索引
注意,"fielddata": true必须要,需要构建正排索引,否则无法对其进行排序操作
增加一个不进行分词的排序字段:
 "fields": {
            "raw":{
              "type":"string",
              "index":"not_analyzed"
            }

完整命令
PUT /website
{
  "mappings": {
    "article":{
      "properties": {
        "title":{
          "type": "text",
          "fields": {
            "raw":{
              "type":"string",
              "index":"not_analyzed"
            }
          },
          "fielddata": true
        },
        "content":{
          "type":"text"
        },
        "post_date":{
          "type":"date"
        },
        "author_id":{
          "type":"long"
        }
      }
    }
  }
}



执行结果:
{
  "acknowledged": true,
  "shards_acknowledged": true
}



准备数据:
PUT /website/article/1
{
  "title":"second article",
  "content":"this is my second article",
  "post_date":"2017-01-01",
  "author_id":100
}
PUT /website/article/2
{
  "title":"first article",
  "content":"this is my first article",
  "post_date":"2017-02-01",
  "author_id":100
}
PUT /website/article/3
{
  "title":"third article",
  "content":"this is my third article",
  "post_date":"2017-03-01",
  "author_id":100
}



查询一下:
{
  "took": 120,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-02-01",
          "author_id": 100
        }
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-01-01",
          "author_id": 100
        }
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 100
        }
      }
    ]
  }
}






执行普通的排序:
GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title": {
        "order": "desc"
      }
    }
  ]
}



执行结果:
可以看到每一个hits中的sort,都会显示排序的实际词,默认情况下都是经过字符串分词后取一个词出来进行排序

        "sort": [
          "third"
        ]

执行结果:        
{
  "took": 1304,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 100
        },
        "sort": [
          "third"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-01-01",
          "author_id": 100
        },
        "sort": [
          "second"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-02-01",
          "author_id": 100
        },
        "sort": [
          "first"
        ]
      }
    ]
  }
}






我们可以指定raw作为排序,自行指定排序raw是title索引出来的一个不进行分词的field
GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title.raw": {
        "order": "desc"
      }
    }
  ]
}



那么,可以看到分词的是整个title的内容
{
  "took": 22,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 100
        },
        "sort": [
          "third article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-01-01",
          "author_id": 100
        },
        "sort": [
          "second article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-02-01",
          "author_id": 100
        },
        "sort": [
          "first article"
        ]
      }
    ]
  }
}



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值