7.3 ElasticSearch运行机制之排序

1.简介
elasticsearch默认采用相关性算分排序,用户可以通过设定sort参数自行设定排序规则。

2.query
查询job字段为“Java engineer”的文档,然后按照出生日期为第一排序字段、相关性得分为第二排序字段、文档id为第三排序字段进行降序排序。

POST /employee/_search
{
  "query": {
    "match": {
      "job": "Java engineer"
    }
  },
  "sort": [
    {
      "birthday": "desc"
    },
    {
      "_score": "desc"
    },
    {
      "_doc": "desc"
    }
  ]
}
  • _score:指相关性得分
  • _doc:指文档内部id,和索引的顺序相关(分片内唯一,不同分片内可能相同)
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "x4l1hnsBEsHOdz1YbMo-",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "Jason Tatum",
          "job" : "Java engineer",
          "age" : 24,
          "salary" : 15000.0,
          "birthday" : "1997-08-02"
        },
        "sort" : [
          870480000000,
          0.2876821,
          0
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "wol1hnsBEsHOdz1YHsp3",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "Stephen Curry",
          "job" : "Java engineer",
          "age" : 27,
          "salary" : 20000.0,
          "birthday" : "1995-08-06"
        },
        "sort" : [
          807667200000,
          0.6931472,
          0
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "wYl0hnsBEsHOdz1Y4cqT",
        "_score" : 0.47000363,
        "_source" : {
          "name" : "James Harden",
          "job" : "Java engineer",
          "age" : 31,
          "salary" : 30000.0,
          "birthday" : "1991-01-01"
        },
        "sort" : [
          662688000000,
          0.47000363,
          0
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "xol1hnsBEsHOdz1YXcqt",
        "_score" : 0.47000363,
        "_source" : {
          "name" : "Chirs Paul",
          "job" : "Java engineer",
          "age" : 33,
          "salary" : 29000.0,
          "birthday" : "1988-12-02"
        },
        "sort" : [
          597024000000,
          0.47000363,
          2
        ]
      }
    ]
  }
}

3.字符串排序
(1).简介
字符串排序比较特殊,因为elasticsearch有text和keyword两种类型,针对text类型排序,会报如下错误。

POST /employee/_search
{
	"sort": [{
		"name": "desc"
	}]
}
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "employee",
        "node": "0JDoSBAVQr29ZB1mytSQgw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

默认情况下,text类型字段上禁用fielddata,如果要达到预期排序效果,可以在name字段上设置fielddata=true,以便通过取消倒排索引将fielddata加载到内存中。请注意,这可能会占用大量内存,因此建议将name字段设置为keyword。

(2).解决方案
排序的过程实质上是对字段原始内容排序的过程,这个过程中倒排索引无法发挥作用,需要用到正排索引,也就是通过文档id和字段可以快速得到字段原始内容。elasticsearch提供了两种实现方式,一种是fielddata默认禁用,一种是doc_values默认启用(除了text类型),fielddata和doc_values的对比如下。

对比fielddatadoc_values
创建时机搜索时即时创建索引时创建,与倒排索引创建时机一致
创建位置jvm heap磁盘
优点不会占用额外的磁盘资源不会占用heap内存
缺点文档过多时,即时创建会花费过多时间,占用过多heap内存减慢索引的速度,占用额外的磁盘资源

3.fielddata
(1).简介
fielddata默认是关闭的,开启后,字符串是按照分词后的term排序,往往结果很难符合预期。其次,一般是在对分词做聚合分析的时候开启。注意fielddata只针对text类型有效。

(2).开启

PUT /people/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "fielddata": true
    },
    "country": {
      "type": "keyword"
    },
    "age": {
      "type": "integer"
    },
    "birthday": {
      "type": "birthday",
      "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    },
    "description": {
      "type": "text"
    }
  }
}

4.doc_values
(1).简介
doc_values默认是启用的,如果在创建索引时明确知道该字段不参与排序和聚合分析时就可以将其关闭以加快索引速度,节省磁盘空间。如果后面需要再开启doc_values,需要做reindex操作。

(2).开启

PUT /people/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "fielddata": true
    },
    "country": {
      "type": "keyword"
    },
    "age": {
      "type": "integer",
      "doc_values": false
    },
    "birthday": {
      "type": "birthday",
      "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    },
    "description": {
      "type": "text"
    }
  }
}

5.聚合分析中的排序
(1).简介
可以使用自带的关键字进行排序,如_count表示对文档数进行排序,_key表示按照field值进行排序,其基本语法如下。

PUT /people/_mapping
{
	"size": 0,
	"aggs": {
		"aggs_name": {
			"terms": {
				"field": "job",
				"size": 10,
				"order": [{
						"_count": "desc"
					},
					{
						"_key": "desc"
					}
				]
			}
		}
	}
}
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "aggs_name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java engineer",
          "doc_count" : 4
        },
        {
          "key" : "Vue engineer",
          "doc_count" : 2
        },
        {
          "key" : "Technical director",
          "doc_count" : 1
        }
      ]
    }
  }
}

(2).order

PUT /people/_mapping
{
	"size": 0,
	"aggs": {
		"histogram_salary": {
			"histogram": {
				"field": "salary",
				"interval": 5000,
				"order": {
					"avg_age": "desc"
				}
			},
			"aggs": {
                "avg_age": {
                    "avg": {
                        "field": "age"
                    }
                }	
			}
		}
	}
}
{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "histogram_salary" : {
      "buckets" : [
        {
          "key" : 50000.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 35.0
          }
        },
        {
          "key" : 25000.0,
          "doc_count" : 2,
          "avg_age" : {
            "value" : 31.5
          }
        },
        {
          "key" : 30000.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 31.0
          }
        },
        {
          "key" : 20000.0,
          "doc_count" : 1,
          "avg_age" : {
            "value" : 27.0
          }
        },
        {
          "key" : 15000.0,
          "doc_count" : 2,
          "avg_age" : {
            "value" : 24.5
          }
        },
        {
          "key" : 35000.0,
          "doc_count" : 0,
          "avg_age" : {
            "value" : null
          }
        },
        {
          "key" : 40000.0,
          "doc_count" : 0,
          "avg_age" : {
            "value" : null
          }
        },
        {
          "key" : 45000.0,
          "doc_count" : 0,
          "avg_age" : {
            "value" : null
          }
        }
      ]
    }
  }
}

(3).更深层次的嵌套

PUT /people/_mapping
{
	"size": 0,
	"aggs": {
		"histogram_salary": {
			"histogram": {
				"field": "salary",
				"interval": 5000,
				"order": {
					"age>avg_age": "desc"
				}
			},
			"aggs": {
				"age": {
					"filter": {
						"range": {
							"age": {
								"gte": 10
							}
						}
					},
					"aggs": {
						"avg_age": {
							"avg": {
								"field": "age"
							}
						}
					}
				}
			}
		}
	}
}
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "histogram_salary" : {
      "buckets" : [
        {
          "key" : 50000.0,
          "doc_count" : 1,
          "age" : {
            "doc_count" : 1,
            "avg_age" : {
              "value" : 35.0
            }
          }
        },
        {
          "key" : 25000.0,
          "doc_count" : 2,
          "age" : {
            "doc_count" : 2,
            "avg_age" : {
              "value" : 31.5
            }
          }
        },
        {
          "key" : 30000.0,
          "doc_count" : 1,
          "age" : {
            "doc_count" : 1,
            "avg_age" : {
              "value" : 31.0
            }
          }
        },
        {
          "key" : 20000.0,
          "doc_count" : 1,
          "age" : {
            "doc_count" : 1,
            "avg_age" : {
              "value" : 27.0
            }
          }
        },
        {
          "key" : 15000.0,
          "doc_count" : 2,
          "age" : {
            "doc_count" : 2,
            "avg_age" : {
              "value" : 24.5
            }
          }
        },
        {
          "key" : 35000.0,
          "doc_count" : 0,
          "age" : {
            "doc_count" : 0,
            "avg_age" : {
              "value" : null
            }
          }
        },
        {
          "key" : 40000.0,
          "doc_count" : 0,
          "age" : {
            "doc_count" : 0,
            "avg_age" : {
              "value" : null
            }
          }
        },
        {
          "key" : 45000.0,
          "doc_count" : 0,
          "age" : {
            "doc_count" : 0,
            "avg_age" : {
              "value" : null
            }
          }
        }
      ]
    }
  }
}
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Elasticsearch 简介 ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索引擎。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。elasticSearch 的使用场景 1、在海量数据前提下,对数据进行检索。比如:京东,淘宝等电商项目课程目标: 1. 了解企业级搜索引擎2. 安装elasticsearch 课程目录: 01 课程介绍02 elasticsearch 简介03 elasticsearch 使用场景04 安装elasticsearch 之前先安装jdk05 安装elasticsearch06 测试elasticsearch是否安装成功 07 安装kibana08 elasticsearch 基本认识 以及添加索引和删除索引09 elasticsearch 添加查询数据10 elasticsearch 修改删除数据11 elasticsearch 有条件的查询12 分词子属性fuzzy查询13 elasticsearch 过滤使用14 elasticsearch 排序与分页15 elasticsearch 如何查询指定的字段16 elasticsearch 高亮显示17 elasticsearch 聚合18 elasticsearch mapping 概念19 elasticsearch 的中文词库20 elasticsearch 中文词库安装测试21 elasticsearch 中文词库的使用案例22 elasticsearch 自定义词库配置23 安装nginx 配置中文词库24 测试elasticsearch 自定义中文词库25 搭建项目父工程26 搭建项目bean-interface-common27 搭建search 的service web 项目28 测试项目是否能与elasticsearch联通29 创建数据库并搭建首页30 数据上传功能的实现类完成31 数据上传控制器完成32 dubbo 介绍以及安装zookeeper33 将数据从mysql 上传到elasticsearch 中34 elasticsearch查询功能分析35 编写业务需求的dsl 语句36 编写输入参数返回结果集的实体类37 实现类编写38 编写实现类中dsl 语句39 返回集结果转换40 结果测试41 测试通过输入查询条件并将数据显示到页面

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值