7.3 ElasticSearch运行机制之排序

1.简介
elasticsearch默认采用相关性算分排序,用户可以通过设定sort参数自行设定排序规则。

2.query
查询job字段为“Java engineer”的文档,然后按照出生日期为第一排序字段、相关性得分为第二排序字段、文档id为第三排序字段进行降序排序。

POST /employee/_search
{
  "query": {
    "match": {
      "job": "Java engineer"
    }
  },
  "sort": [
    {
      "birthday": "desc"
    },
    {
      "_score": "desc"
    },
    {
      "_doc": "desc"
    }
  ]
}
  • _score:指相关性得分
  • _doc:指文档内部id,和索引的顺序相关(分片内唯一,不同分片内可能相同)
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "x4l1hnsBEsHOdz1YbMo-",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "Jason Tatum",
          "job" : "Java engineer",
          "age" : 24,
          "salary" : 15000.0,
          "birthday" : "1997-08-02"
        },
        "sort" : [
          870480000000,
          0.2876821,
          0
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "wol1hnsBEsHOdz1YHsp3",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "Stephen Curry",
          "job" : "Java engineer",
          "age" : 27,
          "salary" : 20000.0,
          "birthday" : "1995-08-06"
        },
        "sort" : [
          807667200000,
          0.6931472,
          0
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "wYl0hnsBEsHOdz1Y4cqT",
        "_score" : 0.47000363,
        "_source" : {
          "name" : "James Harden",
          "job" : "Java engineer",
          "age" : 31,
          "salary" : 30000.0,
          "birthday" : "1991-01-01"
        },
        "sort" : [
          662688000000,
          0.47000363,
          0
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "xol1hnsBEsHOdz1YXcqt",
        "_score" : 0.47000363,
        "_source" : {
          "name" : "Chirs Paul",
          "job" : "Java engineer",
          "age" : 33,
          "salary" : 29000.0,
          "birthday" : "1988-12-02"
        },
        "sort" : [
          597024000000,
          0.47000363,
          2
        ]
      }
    ]
  }
}

3.字符串排序
(1).简介
字符串排序比较特殊,因为elasticsearch有text和keyword两种类型,针对text类型排序,会报如下错误。

POST /employee/_search
{
	"sort": [{
		"name": "desc"
	}]
}
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "employee",
        "node": "0JDoSBAVQr29ZB1mytSQgw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

默认情况下,text类型字段上禁用fielddata,如果要达到预期排序效果,可以在name字段上设置fielddata=true,以便通过取消倒排索引将fielddata加载到内存中。请注意,这可能会占用大量内存,因此建议将name字段设置为keyword。

(2).解决方案
排序的过程实质上是对字段原始内容排序的过程,这个过程中倒排索引无法发挥作用,需要用到正排索引,也就是通过文档id和字段可以快速得到字段原始内容。elasticsearch提供了两种实现方式,一种是fielddata默认禁用,一种是doc_values默认启用(除了text类型),fielddata和doc_values的对比如下。

对比fielddatadoc_values
创建时机搜索时即时创建索引时创建,与倒排索引创建时机一致
创建位置jvm heap磁盘
优点不会占用额外的磁盘资源不会占用heap内存
缺点文档过多时,即时创建会花费过多时间,占用过多heap内存减慢索引的速度,占用额外的磁盘资源

3.fielddata
(1).简介
fielddata默认是关闭的,开启后,字符串是按照分词后的term排序,往往结果很难符合预期。其次,一般是在对分词做聚合分析的时候开启。注意fielddata只针对text类型有效。

(2).开启

PUT /people/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "fielddata": true
    },
    "country": {
      "type": "keyword"
    },
    "age": {
      "type": "integer"
    },
    "birthday": {
      "type": "birthday",
      "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    },
    "description": {
      "type": "text"
    }
  }
}

4.doc_values
(1).简介
doc_values默认是启用的,如果在创建索引时明确知道该字段不参与排序和聚合分析时就可以将其关闭以加快索引速度,节省磁盘空间。如果后面需要再开启doc_values,需要做reindex操作。

(2).开启

PUT /people/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "fielddata": true
    },
    "country": {
      "type": "keyword"
    },
    "age": {
      "type": "integer",
      "doc_values": false
    },
    "birthday": {
      "type": "birthday",
      "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    },
    "description": {
      "type": "text"
    }
  }
}

5.聚合分析中的排序
(1).简介
可以使用自带的关键字进行排序,如_count表示对文档数进行排序,_key表示按照field值进行排序,其基本语法如下。

PUT /people/_mapping
{
	"size": 0,
	"aggs": {
		"aggs_name": {
			"terms": {
				"field": "job",
				"size": 10,
				"order": [{
						"_count": "desc"
					},
					{
						"_key": "desc"
					}
				]
			}
		}
	}
}
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "aggs_name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java engineer",
          "doc_count" : 4
        },
        {
          "key" : "Vue engineer",
          "doc_count" : 2
        },
        {
          "key" : "Technical director",
          "doc_count" : 1
        }
      ]
    }
  }
}

(2).order

PUT /people/_mapping
{
	"size": 0,
	"aggs": {
		"histogram_salary": {
			"histogram": {
				"field": "salary",
				"interval": 5000,
				"order": {
					"avg_age": "desc"
				}
			},
			"aggs": {
                "avg_age": {
                    "avg": {
                        "field": "age"
                    }
                }	
			}
		}
	}
}
{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "histogram_salary" : {
      "buckets" : [
        {
          "key" : 50000.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 35.0
          }
        },
        {
          "key" : 25000.0,
          "doc_count" : 2,
          "avg_age" : {
            "value" : 31.5
          }
        },
        {
          "key" : 30000.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 31.0
          }
        },
        {
          "key" : 20000.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 27.0
          }
        },
        {
          "key" : 15000.0,
          "doc_count" : 2,
          "avg_age" : {
            "value" : 24.5
          }
        },
        {
          "key" : 35000.0,
          "doc_count" : 0,
          "avg_age" : {
            "value" : null
          }
        },
        {
          "key" : 40000.0,
          "doc_count" : 0,
          "avg_age" : {
            "value" : null
          }
        },
        {
          "key" : 45000.0,
          "doc_count" : 0,
          "avg_age" : {
            "value" : null
          }
        }
      ]
    }
  }
}

(3).更深层次的嵌套

PUT /people/_mapping
{
	"size": 0,
	"aggs": {
		"histogram_salary": {
			"histogram": {
				"field": "salary",
				"interval": 5000,
				"order": {
					"age>avg_age": "desc"
				}
			},
			"aggs": {
				"age": {
					"filter": {
						"range": {
							"age": {
								"gte": 10
							}
						}
					},
					"aggs": {
						"avg_age": {
							"avg": {
								"field": "age"
							}
						}
					}
				}
			}
		}
	}
}
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "histogram_salary" : {
      "buckets" : [
        {
          "key" : 50000.0,
          "doc_count" : 1,
          "age" : {
            "doc_count" : 1,
            "avg_age" : {
              "value" : 35.0
            }
          }
        },
        {
          "key" : 25000.0,
          "doc_count" : 2,
          "age" : {
            "doc_count" : 2,
            "avg_age" : {
              "value" : 31.5
            }
          }
        },
        {
          "key" : 30000.0,
          "doc_count" : 1,
          "age" : {
            "doc_count" : 1,
            "avg_age" : {
              "value" : 31.0
            }
          }
        },
        {
          "key" : 20000.0,
          "doc_count" : 1,
          "age" : {
            "doc_count" : 1,
            "avg_age" : {
              "value" : 27.0
            }
          }
        },
        {
          "key" : 15000.0,
          "doc_count" : 2,
          "age" : {
            "doc_count" : 2,
            "avg_age" : {
              "value" : 24.5
            }
          }
        },
        {
          "key" : 35000.0,
          "doc_count" : 0,
          "age" : {
            "doc_count" : 0,
            "avg_age" : {
              "value" : null
            }
          }
        },
        {
          "key" : 40000.0,
          "doc_count" : 0,
          "age" : {
            "doc_count" : 0,
            "avg_age" : {
              "value" : null
            }
          }
        },
        {
          "key" : 45000.0,
          "doc_count" : 0,
          "age" : {
            "doc_count" : 0,
            "avg_age" : {
              "value" : null
            }
          }
        }
      ]
    }
  }
}
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值