小编典典
不理想,但是我认为它可以满足您的需求。
field1假设您是用来定义“重复”文档的字段,请更改字段的映射,如下所示:
PUT /lastseen
{
"mappings": {
"test": {
"properties": {
"field1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"field2": {
"type": "string"
},
"lastseen": {
"type": "long"
}
}
}
}
}
意思是,您添加了一个.raw子字段,not_analyzed这意味着将按原样对它进行索引,而无需进行分析并将其分解为术语。这是为了使有些“重复的文档发现”成为可能。
然后,您需要在上使用terms聚合field1.raw(用于重复项)和top_hits子聚合,以获取每个field1值的单个文档:
GET /lastseen/test/_search
{
"size": 0,
"query": {
"query_string": {
"query": "dinner"
}
},
"aggs": {
"field1_unique": {
"terms": {
"field": "field1.raw",
"size": 2
},
"aggs": {
"first_one": {
"top_hits": {
"size": 1,
"sort": [{"lastseen": {"order":"desc"}}]
}
}
}
}
}
}
此外,传回的那个单一文件top_hits是最高的lastseen(可能使"sort": [{"lastseen":
{"order":"desc"}}])。
您将获得的结果是这些(在aggregationsnot 之下hits):
...
"aggregations": {
"field1_unique": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "dinner carrot potato broccoli",
"doc_count": 2,
"first_one": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "lastseen",
"_type": "test",
"_id": "AU60ZObtjKWeJgeyudI-",
"_score": null,
"_source": {
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"lastseen": 1000
},
"sort": [
1000
]
}
]
}
}
},
{
"key": "fish chicken something",
"doc_count": 2,
"first_one": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "lastseen",
"_type": "test",
"_id": "AU60ZObtjKWeJgeyudJA",
"_score": null,
"_source": {
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 2000
},
"sort": [
2000
]
}
]
}
}
}
]
}
}
2020-06-22