ElasticsearchCRUD使用(十六)【Elasticsearch聚合】

本文介绍如何使用ElasticsearchCRUD实现Elasticsearch聚合搜索请求和响应。

Elasticsearch聚合

Elasticsearch聚合API允许您在近实时或即时的时候对数据进行总结,计算,分组。这些聚合可以实现子聚合,可以根据需要再次实现更多的子聚合。这允许非常灵活的API。 ElasticsearchCRUD支持以下聚合:

最小聚合,最大聚合,总和,平均聚合,统计量,扩展统计量,数值聚合,百分位数聚合,百分位数等级聚合,基数聚合,地理边界聚合,顶点聚合,脚本公制聚合,全局聚合,过滤器聚合,多过滤器聚合,过滤器命名聚合,缺少聚合,嵌套聚合,反向嵌套聚合,子聚合,桶聚合,重要桶聚合,范围聚合,日期范围聚合,直方图聚合,日期直方图聚合,地理距离聚合,GeoHash网格聚集(Min Aggregation, Max Aggregation, Sum Aggregation, Avg Aggregation, Stats Aggregation, Extended Stats Aggregation, Value Count Aggregation, Percentiles Aggregation, Percentile Ranks Aggregation, Cardinality Aggregation, Geo Bounds Aggregation, Top hits Aggregation, Scripted Metric Aggregation, Global Aggregation, Filter Aggregation, Filters Aggregation, Filters Named Aggregation, Missing Aggregation, Nested Aggregation, Reverse nested Aggregation, Children Aggregation, Terms Aggregation, Significant Terms Aggregation, Range Aggregation, Date Range Aggregation, Histogram Aggregation, Date Histogram Aggregation, Geo Distance Aggregation, GeoHash grid Aggregation )

这些聚合可以分为公制和桶聚合。两种聚合类型都可以是单值聚合或多值聚合。桶聚合可以包含子聚合。

公制聚集
- 单值公制
- 多值公制

桶级聚合
- 单桶聚合
- 多重桶聚合

Terms Bucket聚合的示例

桶聚合是基于单个字段的多值聚合。以下示例创建一个名为testFirstName的聚合,并使用person索引中person类型的firstname字段。

SeachType设置为键入count,以便在搜索请求中没有返回命中。聚合搜索的结果可以使用聚合的testFirstName名称直接从Json对象转换为TermsBucketAggregationsResult类。

TermsBucketAggregationsResult aggResult;
var search = new Search
{
    Aggs = new List<IAggs>
    {
        new TermsBucketAggregation("testFirstName", "firstname")
        {
            Size = 20
        }
    }
};

using (var context = new ElasticsearchContext(ConnectionString, ElasticsearchMappingResolver))
{
    var items = context.Search<Person>(
        search, 
        new SearchUrlParameters 
        { 
            SeachType = SeachType.count 
        });

    aggResult = 
        items.PayloadResult.Aggregations.GetComplexValue<TermsBucketAggregationsResult>("testFirstName");
}

请求发送如下:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 68
Expect: 100-continue
Connection: Keep-Alive

{
    "aggs": {
        "testFirstName": {
            "terms": {
                "field": "firstname",
                "size": 20 }
        }
    }
}

结果是:

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 875

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 19972,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "testFirstName": {
            "doc_count_error_upper_bound": 50,
            "sum_other_doc_count": 18159,
            "buckets": [{
                "key": "katherine",
                "doc_count": 99 },
            {
                "key": "james",
                "doc_count": 97 },
            {
                "key": "marcus",
                "doc_count": 97 },
            {
                "key": "alexandra",
                "doc_count": 93 },
            {
                "key": "dalton",
                "doc_count": 93 },
            {
                "key": "lucas",
                "doc_count": 93 },
            {
                "key": "morgan",
                "doc_count": 93 },
            {
                "key": "richard",
                "doc_count": 93 },
            {
                "key": "isabella",
                "doc_count": 92 },
            {
                "key": "seth",
                "doc_count": 92 },
            {
                "key": "natalie",
                "doc_count": 91 },
            {
                "key": "eduardo",
                "doc_count": 90 },
            {
                "key": "kaitlyn",
                "doc_count": 90 },
            {
                "key": "robert",
                "doc_count": 90 },
            {
                "key": "sydney",
                "doc_count": 90 },
            {
                "key": "ian",
                "doc_count": 89 },
            {
                "key": "julia",
                "doc_count": 89 },
            {
                "key": "chloe",
                "doc_count": 88 },
            {
                "key": "xavier",
                "doc_count": 88 },
            {
                "key": "david",
                "doc_count": 87 }]
        }
    }
}

命中值也可以作为子聚合添加到桶聚合中。 在以下示例中,TermsBucketAggregation类中的子聚合包含单个TopHitsMetricAggregation聚合。

var search = new Search
{
    Aggs = new List<IAggs>
    {
        new TermsBucketAggregation("testLastName", "lastname")
        {
            Size = 5,
            Aggs = new List<IAggs>
            {
                new TopHitsMetricAggregation("tophits")
                {
                    Size = 2
                }
            }
        }
    }
};

以上代码发送到Elasticsearch如下:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 108
Expect: 100-continue

{
    "aggs": {
        "testLastName": {
            "terms": {
                "field": "lastname",
                "size": 5 },
            "aggs": {
                "tophits": { "top_hits": { "size": 2 } } }
        }
    }
}

可以使用TopHitsMetricAggregationsResult类来访问每个数据桶的命中。 聚合的tophits名称是在搜索请求中配置的名称。

var hits = childbucket.GetSubAggregationsFromJTokenName<TopHitsMetricAggregationsResult<Person>>("tophits");

多桶聚合也可以作为子聚合添加。 这是一个例子,它将一个SignificantTermsBucketAggregation添加到TermsBucketAggregation。 这可以找到所有具有相同名字和姓氏的人物。 SignificantTermsBucketAggregation包含Top Hits子集合。

var search = new Search
{
    Aggs = new List<IAggs>
    {
        new TermsBucketAggregation("testLastName", "lastname")
        {
            Size = 0,
            Aggs = new List<IAggs>
            {
                new SignificantTermsBucketAggregation("testFirstName", "firstname")
                {
                    Size = 20,
                    Aggs = new List<IAggs>
                    {
                        new TopHitsMetricAggregation("tophits")
                        {
                            Size = 20
                        }
                    }
                }
            }
        }
    }
};

请求发送如下:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 188
Expect: 100-continue

{
    "aggs": {
        "testLastName": {
            "terms": {
                "field": "lastname",
                "size": 0 },
            "aggs": {
                "testFirstName": { "significant_terms": { "field": "firstname", "size": 20 }, "aggs": { "tophits": { "top_hits": { "size": 20 } } } } }
        }
    }
}

结果可以在控制台应用程序中显示如下:

//允许在控制台中显示聚合结果 
foreach (var bucket in aggResult.Buckets)
{
    var significantTermsBucketAggregationsResult = bucket.GetSubAggregationsFromJTokenName<SignificantTermsBucketAggregationsResult>("testFirstName");

    foreach (var childbucket in significantTermsBucketAggregationsResult.Buckets)
    {
        bool writeHeader = true;
        var hits = childbucket.GetSubAggregationsFromJTokenName<TopHitsMetricAggregationsResult<Person>>("tophits");
        foreach (var hit in hits.Hits.HitsResult)
        {
            if (writeHeader)
            {
                Console.Write("\n{0} {1}, Found Ids: ", hit.Source.FirstName, hit.Source.LastName);
            }
            Console.Write("{0} ", hit.Id);
            writeHeader = false;
        }
    }
}

具有DateRangeBucketAggregationExtendedStatsMetricAggregation的示例

此示例显示如何获取整个索引的文档的扩展统计信息,并使用DateRangeBucketAggregation获取专业年份。 DateRangeBucketAggregation包含一个具有一个扩展统计信息多值度量聚合的子聚合。

var search = new Search
{
    Aggs = new List<IAggs>
    {
        new ExtendedStatsMetricAggregation("stats", "modifieddate"),
        new DateRangeBucketAggregation("testRangesBucketAggregation", "modifieddate", "MM-yyy", new List<RangeAggregationParameter<string>>
        {
            new ToRangeAggregationParameter<string>("now-10y/y"),
            new ToFromRangeAggregationParameter<string>("now-8y/y", "now-9y/y"),
            new ToFromRangeAggregationParameter<string>("now-7y/y", "now-8y/y"),
            new ToFromRangeAggregationParameter<string>("now-6y/y", "now-7y/y"),
            new ToFromRangeAggregationParameter<string>("now-5y/y", "now-6y/y"),
            new FromRangeAggregationParameter<string>("now-5y/y")
        })
        {
            Aggs = new List<IAggs>
            {
                new ExtendedStatsMetricAggregation("stats", "modifieddate")
            } 
        }
    }
};

该请求被发送到Elasticsearch:

POST http://localhost:9200/persons/person/_search?&search_type=count HTTP/1.1
Content-Type: application/json
Host: localhost:9200
Content-Length: 405
Expect: 100-continue
Connection: Keep-Alive

{
    "aggs": {
        "stats": {
            "extended_stats": {
                "field": "modifieddate" }
        },
        "testRangesBucketAggregation": {
            "date_range": {
                "field": "modifieddate",
                "format": "MM-yyy",
                "ranges": [{ "to": "now-10y/y" }, { "to": "now-8y/y", "from": "now-9y/y" }, { "to": "now-7y/y", "from": "now-8y/y" }, { "to": "now-6y/y", "from": "now-7y/y" }, { "to": "now-5y/y", "from": "now-6y/y" }, { "from": "now-5y/y" }] },
            "aggs": {
                "stats": { "extended_stats": { "field": "modifieddate" } } }
        }
    }
}

这将返回具有整个索引的一个全局统计信息和一个每个日期范围的结果作为子集合。

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 2280

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 19972,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "stats": {
            "count": 19972,
            "min": 9.643968E11,
            "max": 1.242491613123E12,
            "avg": 1.1821553155683428E12,
            "sum": 2.3610005962530944E16,
            "sum_of_squares": 2.793293745713814E28,
            "variance": 1.1137296180611115E21,
            "std_deviation": 3.3372587823857944E10
        },
        "testRangesBucketAggregation": {
            "buckets": [{
                "key": "*-01-2005",
                "to": 1.1045376E12,
                "to_as_string": "01-2005",
                "doc_count": 527,
                "stats": { "count": 527, "min": 9.643968E11, "max": 1.1043648E12, "avg": 1.0483479256166982E12, "sum": 5.524793568E14, "sum_of_squares": 5.79347110411684E26, "variance": 2.97007142991014E20, "std_deviation": 1.7233895177556755E10 } },
            {
                "key": "01-2006-01-2007",
                "from": 1.1360736E12,
                "from_as_string": "01-2006",
                "to": 1.1676096E12,
                "to_as_string": "01-2007",
                "doc_count": 3071,
                "stats": { "count": 3071, "min": 1.1360736E12, "max": 1.1675232E12, "avg": 1.1524215434711821E12, "sum": 3.53908656E15, "sum_of_squares": 4.0787668666288754E27, "variance": 8.051796664238183E19, "std_deviation": 8.97318040843835E9 } },
            {
                "key": "01-2007-01-2008",
                "from": 1.1676096E12,
                "from_as_string": "01-2007",
                "to": 1.1991456E12,
                "to_as_string": "01-2008",
                "doc_count": 7958,
                "stats": { "count": 7958, "min": 1.1676096E12, "max": 1.1990592E12, "avg": 1.188431685147022E12, "sum": 9.4575393504E15, "sum_of_squares": 1.1240140107520037E28, "variance": 6.291530282695307E19, "std_deviation": 7.931916718357113E9 } },
            {
                "key": "01-2008-01-2009",
                "from": 1.1991456E12,
                "from_as_string": "01-2008",
                "to": 1.230768E12,
                "to_as_string": "01-2009",
                "doc_count": 7101,
                "stats": { "count": 7101, "min": 1.1991456E12, "max": 1.2174624E12, "avg": 1.207894813688213E12, "sum": 8.577261072E15, "sum_of_squares": 1.0360606565306019E28, "variance": 2.49825077336829E19, "std_deviation": 4.9982504672818165E9 } },
            {
                "key": "01-2009-01-2010",
                "from": 1.230768E12,
                "from_as_string": "01-2009",
                "to": 1.262304E12,
                "to_as_string": "01-2010",
                "doc_count": 10,
                "stats": { "count": 10, "min": 1.24249161306E12, "max": 1.242491613123E12, "avg": 1.2424916130944E12, "sum": 1.2424916130944E13, "sum_of_squares": 1.543785408609924E25, "variance": 0.0, "std_deviation": 0.0 } },
            {
                "key": "01-2010-*",
                "from": 1.262304E12,
                "from_as_string": "01-2010",
                "doc_count": 0,
                "stats": { "count": 0, "min": null, "max": null, "avg": null, "sum": null, "sum_of_squares": null, "variance": null, "std_deviation": null } }]
        }
    }
}

为Elasticsearch中的每个聚合类型提供结果类,因此除非需要,否则不需要创建自己的结果DTO。 任何类都可以用来获取数据。 包含聚合结果的JToken是公开的,如果首选也可以使用。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值