Elasticsearch(十一)【NEST高级客户端--聚合】

最新推荐文章于 2024-10-18 09:49:15 发布

.NET跨平台

最新推荐文章于 2024-10-18 09:49:15 发布

阅读量4.3k

点赞数 1

分类专栏：全文搜索引擎之ElasticSearch和Solr .net平台下分布式开发技术文章标签： elasticsearch 聚合数据搜索

全文搜索引擎之ElasticSearch和Solr 同时被 2 个专栏收录

50 篇文章 11 订阅

订阅专栏

.net平台下分布式开发技术

50 篇文章 57 订阅

订阅专栏

聚合

聚合框架有助于提供基于搜索查询的聚合数据。它基于称为聚合的简单构建块，可以组合以构建复杂的数据统计。

 写聚合
 保留的聚合名称

写聚合

NEST允许你写聚合，使用

 严格流畅的DSL
 逐字对象初始化器语法，将逐字映射到Elasticsearch API
 一个更简洁的对象初始化器聚合DSL

三种不同的方式，这是一个很大的收获！让我们一次过一个，并解释什么时候你可能想要使用它们。

这是每个例子的json输出:

{
  "aggs": {
    "name_of_child_agg": {
      "children": {
        "type": "commits"
      },
      "aggs": {
        "average_per_child": {
          "avg": { "field": "confidenceFactor" } },
        "max_per_child": {
          "max": { "field": "confidenceFactor" } },
        "min_per_child": {
          "min": { "field": "confidenceFactor" } }
      }
    }
  }
}

Fluent DSL

Fluent 的lambda语法是编写聚合的最简单的方式。它受益于转载到子集合的类型

Fluent DSL example

s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(childAggs => childAggs
            .Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        )
    )
)

对象初始化器语法

对象初始化器语法（OIS）是一个一对一映射，如何在Elasticsearch API中表示聚合。虽然它是一个一对一映射的好处，但是基于C＃的字典意味着它可以更快地填充。

以下与Fluent API中所述的相同的聚合方式与基于字典的对象初始化程序语法相同

对象初始化程序语法示例

new SearchRequest<Project>
{
    Aggregations = new AggregationDictionary
    {
        { "name_of_child_agg", new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
            {
                Aggregations = new AggregationDictionary
                {
                    { "average_per_child", new AverageAggregation("average_per_child", "confidenceFactor") },
                    { "max_per_child", new MaxAggregation("max_per_child", "confidenceFactor") },
                    { "min_per_child", new MinAggregation("min_per_child", "confidenceFactor") },
                }
            }
        }
    }
}

这开始变得很难阅读，你不同意吗？有一种更好的方式…

简洁对象初始化语法

通过直接使用*Aggregation类型可以显着缩短对象初始化程序语法，允许您放弃引入中间字典来表示聚合DSL的需要。在使用它们时，也可以使用按位&&运算符组合多个聚合。

将以下示例与先前的普通对象初始化器语法进行比较

对象初始化程序语法示例

new SearchRequest<Project>
{
    Aggregations = new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
    {
        Aggregations =
            new AverageAggregation("average_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
            && new MaxAggregation("max_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
            && new MinAggregation("min_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
    }
}

现在这样更干净！将*Aggregation类型直接分配给搜索请求的Aggregation性，因为在NEST中有隐式转换来处理此问题。

聚合集合之上的聚合

高级场景可能涉及应该根据请求设置为聚合的聚合函数的现有集合。使用LINQ的.Aggregate（）方法，每个函数可以应用于下面的汇编描述符childAggs），然后在每个函数应用程序之后返回描述符。

Fluent DSL示例

var aggregations = new List<Func<AggregationContainerDescriptor<CommitActivity>, IAggregationContainer>> 
{
    a => a.Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor)),
    a => a.Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor)),
    a => a.Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
};

return s => s
    .Aggregations(aggs => aggs
        .Children<CommitActivity>("name_of_child_agg", child => child
            .Aggregations(childAggs =>
                aggregations.Aggregate(childAggs, (acc, agg) => { agg(acc); return acc; }) 
            )
        )
    );

使用按位&&运算符也可以组合多个AggregationDescriptor

var aggregations = new AggregationContainerDescriptor<CommitActivity>()
        .Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        .Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        && new AggregationContainerDescriptor<CommitActivity>()
            .Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor));

return s => s
        .Aggregations(aggs => aggs
            .Children<CommitActivity>("name_of_child_agg", child => child
                .Aggregations(childAggs => aggregations)
            )
        );

Aggs vs. Aggregations

该响应同时显示用于处理聚合的.Aggregations和.Aggs属性。为什么你问两个属性？那么，前者是一个汇总名称到IAggregate类型的字典，这是聚合响应的通用接口（称为NEST中的聚合），后者是一个方便的帮助类，可以根据一个关键字从字典中获取正确的聚合响应类型名称。

这更好地示例了一个例子。我们假设我们提出以下要求

s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(childAggs => childAggs
            .Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        )
    )
)

现在，使用.Aggs，我们可以很容易地得到Children的聚合响应，从而达到Average和Max子聚合

处理响应

response.IsValid.Should().BeTrue();

var childAggregation = response.Aggs.Children("name_of_child_agg");

var averagePerChild = childAggregation.Average("average_per_child");

averagePerChild.Should().NotBeNull(); 

var maxPerChild = childAggregation.Max("max_per_child");

maxPerChild.Should().NotBeNull();

保留的聚合名称

NEST使用基于启发式的解析器将聚合解析成类型化的响应。因此，我们用于做出关于解析的决策的一些关键属性不允许作为请求中聚合的关键名称。

Elasticsearch 5.x将在某种程度上得到一个可折叠的方式返回聚合的标志。当这种情况发生时，此限制将被解除，但直到那时为止，避免汇总键的以下名称：

new []
{
    "score",
    "value_as_string",
    "keys",
    "max_score"
}