03.elasticsearch pipeline aggregation查询


elasticsearch的aggregate查询现在越来越丰富了,目前总共有4类。

  1. metric aggregation: 主要是min,max,avg,sum,percetile 等单个统计指标的查询
  2. bucket aggregation: 主要是类似group by的查询操作
  3. matrix aggregation: 使用多个字段的值进行计算从而产生一个多维矩阵
  4. pipeline aggregation: 主要是能够在其他的aggregation进行一些附加的处理来增强数据

本篇就主要学习pipeline aggregation

1. pipeline aggregation查询语法

1. 符号代表

  1. 聚合分隔符 >,指定父子聚合关系,如:“my_bucket>my_stats.avg”
  2. 统计指标分隔符 .,指定聚合的特定统计指标
  3. 聚合名称 <name of the aggregation>,直接指定聚合的名称
  4. 统计指标 <name of the metric>,直接指定统计指标
  5. 完整路径 agg_name[> agg_name]*[. metrics],综合利用上面的方式指定完整路径
  6. 特殊值 _count,bucket的文档个数这个是一个特殊的统计指标(metric),可以在pipeline中对应bucket的doc数量。

2. 聚合层级

** 1.parent **
此类聚合的"输入"是其【父聚合】的输出,并对其进行进一步处理。一般不生成新的桶,而是对父聚合桶信息的增强,可以在parent agg 的每一个bucket中添加新的统计指标。
这种典型的就是移动平均的计算,倒数计算,在parent中的每个bucket中都会增加一个统计指标。

** 2.sibling **
此类聚合的输入是其【兄弟聚合】的输出。并能在同级上计算新的聚合bucket,也就会产生新的agg bucket 分组。
这种典型的就是min,max等在原有bucket的基础上再增加一个新的bucket来输出min,max的值

2. pipeline aggregation 查询类型概览

1. sibling aggregation

  1. Avg Bucket Aggregation: sibling agg, 对bucket的统计值求average
  2. Max Bucket Aggregation: sibling agg, 求bucket中的最大的bucket
  3. Min Bucket Aggregation: sibling agg, 求一组bucket中的最小的bucket
  4. Sum Bucket Aggregation: sibling agg, 对一组bucket求sum
  5. Stats Bucket Aggregation: sibling agg, 对一组bucket求stats
  6. Extended Stats Bucket Aggregation: sibling agg, 对一组bucket求extend stats
  7. Percentiles Bucket Aggregation: sibling agg, 对一组bucket求percentiles

2. parent aggregation

  1. Derivative Aggregation: parent agg , 对histogram或date_histogram类型求导
  2. Moving Average Aggregation: parent agg, 对一组bucket求移动平均值,过期了
  3. Moving Function Aggregation: parent agg, 最一组bucket移动使用function
  4. Cumulative Sum Aggregation: 截止到当前bucket的累计求和
  5. Bucket Script Aggregation: parent agg , 桶脚本聚合——基于父聚合的【一个或多个权值】,对这些权值通过脚本进行运算
  6. Bucket Selector Aggregation: parent agg , 对一组bucket执行过滤操作,只有满足过滤条件的bucket会被保留到结果集当中
  7. Bucket Sort Aggregation: 对bucket进行排序
  8. Serial Differencing Aggregation: parent agg 串行差分聚合

3. 数据准备

traffic_stats存储的是博客每天的阅读信息,包括阅读量和最大阅读耗时

PUT traffic_stats
{
  "mappings": {
    "properties": {
      "date": {
        "type": "date",
        "format": "dateOptionalTime"
      },
      "visits": {
        "type": "integer"
      },
      "max_time_spent": {
        "type": "integer"
      }
    }
  }
}

数据

PUT _bulk
{"index":{"_index":"traffic_stats"}}
{"visits":"488", "date":"2018-10-1", "max_time_spent":"900"}
{"index":{"_index":"traffic_stats"}}
{"visits":"783", "date":"2018-10-6", "max_time_spent":"928"}
{"index":{"_index":"traffic_stats"}}
{"visits":"789", "date":"2018-10-12", "max_time_spent":"1834"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1299", "date":"2018-11-3", "max_time_spent":"592"}
{"index":{"_index":"traffic_stats"}}
{"visits":"394", "date":"2018-11-6", "max_time_spent":"1249"}
{"index":{"_index":"traffic_stats"}}
{"visits":"448", "date":"2018-11-24", "max_time_spent":"874"}
{"index":{"_index":"traffic_stats"}}
{"visits":"768", "date":"2018-12-18", "max_time_spent":"876"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1194", "date":"2018-12-24", "max_time_spent":"1249"}
{"index":{"_index":"traffic_stats"}}
{"visits":"987", "date":"2018-12-28", "max_time_spent":"1599"}
{"index":{"_index":"traffic_stats"}}
{"visits":"872", "date":"2019-01-1", "max_time_spent":"828"}
{"index":{"_index":"traffic_stats"}}
{"visits":"972", "date":"2019-01-5", "max_time_spent":"723"}
{"index":{"_index":"traffic_stats"}}
{"visits":"827", "date":"2019-02-5", "max_time_spent":"1300"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1584", "date":"2019-02-15", "max_time_spent":"1500"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1604", "date":"2019-03-2", "max_time_spent":"1488"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1499", "date":"2019-03-27", "max_time_spent":"1399"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1392", "date":"2019-04-8", "max_time_spent":"1294"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1247", "date":"2019-04-15", "max_time_spent":"1194"}
{"index":{"_index":"traffic_stats"}}
{"visits":"984", "date":"2019-05-15", "max_time_spent":"1184"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1228", "date":"2019-05-18", "max_time_spent":"1485"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1423", "date":"2019-06-14", "max_time_spent":"1452"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1238", "date":"2019-06-24", "max_time_spent":"1329"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1388", "date":"2019-07-14", "max_time_spent":"1542"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1499", "date":"2019-07-24", "max_time_spent":"1742"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1523", "date":"2019-08-13", "max_time_spent":"1552"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1443", "date":"2019-08-19", "max_time_spent":"1511"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1587", "date":"2019-09-14", "max_time_spent":"1497"}
{"index":{"_index":"traffic_stats"}}
{"visits":"1534", "date":"2019-09-27", "max_time_spent":"1434"}

4.使用样例

1. sibling aggregation

1. Avg Bucket Aggregation: sibling agg, 对bucket的统计值求average
1. 普通metric求average

1.先用date_histogram算一下每月有多少天有人阅读和当月中阅读量最多的一天对应的阅读量


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    }
  }
}


返回

"aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 789.0
          }
        },
        {
          "key_as_string" : "2018-11-01T00:00:00.000Z",
          "key" : 1541030400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1299.0
          }
        },
        {
          "key_as_string" : "2018-12-01T00:00:00.000Z",
          "key" : 1543622400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1194.0
          }
        },
        {
          "key_as_string" : "2019-01-01T00:00:00.000Z",
          "key" : 1546300800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 972.0
          }
        },
        {
          "key_as_string" : "2019-02-01T00:00:00.000Z",
          "key" : 1548979200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1584.0
          }
        },
        {
          "key_as_string" : "2019-03-01T00:00:00.000Z",
          "key" : 1551398400000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1604.0
          }
        },
        {
          "key_as_string" : "2019-04-01T00:00:00.000Z",
          "key" : 1554076800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1392.0
          }
        },
        {
          "key_as_string" : "2019-05-01T00:00:00.000Z",
          "key" : 1556668800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1228.0
          }
        },
        {
          "key_as_string" : "2019-06-01T00:00:00.000Z",
          "key" : 1559347200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1423.0
          }
        },
        {
          "key_as_string" : "2019-07-01T00:00:00.000Z",
          "key" : 1561939200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1499.0
          }
        },
        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    }
  }

增加一个求average的sibling agg, 求每个月的阅读量最多的一天的数平均值(每个月取浏览量最多的一天)


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    },
    "average_month_max": {
      "avg_bucket": {
        "buckets_path": "month_term>max_view_count.value"
      }
    }
  }
}

注意这里的bucket_path

"buckets_path": "month_term>max_view_count.value"

month_term和max_view_count都是agg name所以使用>来进行连接
value是max_view_count的统计值,所以使用 .来进行连接

生成的结果是

"aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 789.0
          }
        },
	...
	...
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    },
    "average_month_max" : {
      "value" : 1341.1666666666667
    }
  }

注意下面这个运行的是没有正确结果的
勘误,这里的使用方式有问题,这个可以改进的


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      }
    },
    "avg_month_max": {
      "avg_bucket": {
        "buckets_path": "month_term.doc_count"  # 这个地方改成month_term._count 就会有结果了。
      }
    }
  }
}

返回

  "aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3
        },
	...
	...

        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2
        }
      ]
    },
    "avg_month_max" : {
      "value" : null    # 这里面没有正确返回
    }
  }

因为他要求sibling agg是一个多个bucket的agg,而且对应的metric是一个数值型的,这里的month_term返回的是一个对象,可能就是这个原因

这里需要勘误一下,这个地方之所以不行是因为使用有误,这里应该使用date_histogram返回的bucket的特殊metric _count

2. 对特殊metric _count求avg

求每个月有阅读记录的天数,并给出天数最多的月份和每个月的平均阅读天数


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_days_count": {  # 有阅读记录的天数 ,每个bucket的doc_count,对应的metric为_count
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      }
    },
    "max_day_month":{ # 阅读天数最多的月份,这里使用了特殊metric  _count
      "max_bucket": {
        "buckets_path": "month_days_count._count"
      }
    },
    "avg_day_each_month":{  # 每个月份的平均阅读天数,这里使用了特殊metric  _count
      "max_bucket": {
      "avg_bucket": {
        "buckets_path": "month_days_count._count"
      }
    }
  }
}

返回

"aggregations" : {
    "month_days_count" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3
        },
        {
          "key_as_string" : "2018-11-01T00:00:00.000Z",
          "key" : 1541030400000,
          "doc_count" : 3
        },
        {
          "key_as_string" : "2018-12-01T00:00:00.000Z",
          "key" : 1543622400000,
          "doc_count" : 3
        },
        {
          "key_as_string" : "2019-01-01T00:00:00.000Z",
          "key" : 1546300800000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-02-01T00:00:00.000Z",
          "key" : 1548979200000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-03-01T00:00:00.000Z",
          "key" : 1551398400000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-04-01T00:00:00.000Z",
          "key" : 1554076800000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-05-01T00:00:00.000Z",
          "key" : 1556668800000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-06-01T00:00:00.000Z",
          "key" : 1559347200000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-07-01T00:00:00.000Z",
          "key" : 1561939200000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2
        }
      ]
    },
    "max_day_month" : {
      "value" : 3.0,
      "keys" : [
        "2018-10-01T00:00:00.000Z",
        "2018-11-01T00:00:00.000Z",
        "2018-12-01T00:00:00.000Z"
      ]
    },
    "avg_day_each_month" : {
      "value" : 2.25
    }
  }
2. Max Bucket Aggregation: sibling agg, 求bucket中的最大的bucket

承接average查询,和avg_bucket类似


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    },
    "max_month_max": {
      "max_bucket": {
        "buckets_path": "month_term>max_view_count.value"
      }
    }
  }
}


  "aggregations" : {
    "month_term" : {
      "buckets" : [
	...
	...

        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    },
    "max_month_max" : {
      "value" : 1604.0,
      "keys" : [
        "2019-03-01T00:00:00.000Z"
      ]
    }
  }

3. Min Bucket Aggregation: sibling agg, 求一组bucket中的最小的bucket

GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    },
    "min_month_max": {
      "min_bucket": {
        "buckets_path": "month_term>max_view_count.value"
      }
    }
  }
}



返回

 "aggregations" : {
    "month_term" : {
      "buckets" : [
	...
	...
        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    },
    "min_month_max" : {
      "value" : 789.0,
      "keys" : [
        "2018-10-01T00:00:00.000Z"
      ]
    }
  }

4. Sum Bucket Aggregation: sibling agg, 对一组bucket求sum

使用样例


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    },
    "sum_month_max": {
      "sum_bucket": {
        "buckets_path": "month_term>max_view_count.value"
      }
    }
  }
}



返回

 "aggregations" : {
    "month_term" : {
      "buckets" : [
	...
	...
        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    },
    "sum_month_max" : {
      "value" : 16094.0
    }
  }

5. Stats Bucket Aggregation: sibling agg, 对一组bucket求stats

使用样例


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    },
    "stats_month_max": {
      "stats_bucket": {
        "buckets_path": "month_term>max_view_count.value"
      }
    }
  }
}


返回

 "aggregations" : {
    "month_term" : {
      "buckets" : [
	...
	...
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    },
    "stats_month_max" : {
      "count" : 12,
      "min" : 789.0,
      "max" : 1604.0,
      "avg" : 1341.1666666666667,
      "sum" : 16094.0
    }
  }


6. Extended Stats Bucket Aggregation: sibling agg, 对一组bucket求extend stats

使用样例


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    },
    "extend_stats_month_max": {
      "extended_stats_bucket": {
        "buckets_path": "month_term>max_view_count.value"
      }
    }
  }
}


返回结果

  "aggregations" : {
    "month_term" : {
      "buckets" : [
	...
	...

        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 789.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    },
    "extend_stats_month_max" : {
      "count" : 12,
      "min" : 789.0,
      "max" : 1604.0,
      "avg" : 1341.1666666666667,
      "sum" : 16094.0,
      "sum_of_squares" : 2.231789E7,
      "variance" : 61096.13888888899,
      "std_deviation" : 247.17633157098393,
      "std_deviation_bounds" : {
        "upper" : 1835.5193298086347,
        "lower" : 846.8140035246988
      }
    }
  }

7. Percentiles Bucket Aggregation: sibling agg, 对一组bucket求percentiles

使用样例


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        }
      }
    },
    "percentile_month_max": {
      "percentiles_bucket": {
        "buckets_path": "month_term>max_view_count.value"
      }
     
    }
  }
}

返回

 "aggregations" : {
    "month_term" : {
      "buckets" : [
	...
	...

        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    },
    "percentile_month_max" : {
      "values" : {
        "1.0" : 789.0,
        "5.0" : 972.0,
        "25.0" : 1228.0,
        "50.0" : 1423.0,
        "75.0" : 1523.0,
        "95.0" : 1587.0,
        "99.0" : 1604.0
      }
    }
  }

2. parent aggregation

1. Derivative Aggregation: parent agg , 对histogram或date_histogram类型求导

查询样例


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        },
        "deriv_month_max": {  # 注意这里的层级变了,原来的sibling查询是和month_term同级的
          "derivative": {
            "buckets_path": "max_view_count.value"
          }
        }
      }
    }
  }
}


求一阶导数就是相邻的差值,注意看上面的deriv_month_max 的层级变了

返回

  "aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 789.0
          }
        },
        {
          "key_as_string" : "2018-11-01T00:00:00.000Z",
          "key" : 1541030400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1299.0
          },
          "deriv_month_max" : {
            "value" : 510.0
          }
        },
        {
          "key_as_string" : "2018-12-01T00:00:00.000Z",
          "key" : 1543622400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1194.0
          },
          "deriv_month_max" : {
            "value" : -105.0
          }
        },
        {
          "key_as_string" : "2019-01-01T00:00:00.000Z",
          "key" : 1546300800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 972.0
          },
          "deriv_month_max" : {
            "value" : -222.0
          }
        },
        {
          "key_as_string" : "2019-02-01T00:00:00.000Z",
          "key" : 1548979200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1584.0
          },
          "deriv_month_max" : {
            "value" : 612.0
          }
        },
        {
          "key_as_string" : "2019-03-01T00:00:00.000Z",
          "key" : 1551398400000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1604.0
          },
          "deriv_month_max" : {
            "value" : 20.0
          }
        },
        {
          "key_as_string" : "2019-04-01T00:00:00.000Z",
          "key" : 1554076800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1392.0
          },
          "deriv_month_max" : {
            "value" : -212.0
          }
        },
        {
          "key_as_string" : "2019-05-01T00:00:00.000Z",
          "key" : 1556668800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1228.0
          },
          "deriv_month_max" : {
            "value" : -164.0
          }
        },
        {
          "key_as_string" : "2019-06-01T00:00:00.000Z",
          "key" : 1559347200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1423.0
          },
          "deriv_month_max" : {
            "value" : 195.0
          }
        },
        {
          "key_as_string" : "2019-07-01T00:00:00.000Z",
          "key" : 1561939200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1499.0
          },
          "deriv_month_max" : {
            "value" : 76.0
          }
        },
        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          },
          "deriv_month_max" : {
            "value" : 24.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          },
          "deriv_month_max" : {
            "value" : 64.0
          }
        }
      ]
    }
  }

2. Moving Average Aggregation: parent agg, 对一组bucket求移动平均值

这个现在过期了,当前推荐使用的是Moving Function Aggregation
可以使用MovingFunctions.unweightedAvg(values) 来代替这个agg操作

3. Moving Function Aggregation: parent agg, 对一组bucket移动使用function

GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        },
        "move_avg_view": {
          "moving_fn": {
            "buckets_path": "max_view_count",
            "window": 2,
            "script": "MovingFunctions.unweightedAvg(values)"
          }
        }
      }
    }
  }
}

这里窗口设置的为2,也就是临近的两个bucket求平均值,
第一个bucket因为没有其他bucket可以和他求平均,所以是null, 第二个bucket的均值等于第一个的,第三个bucket的移动均值是(bucket01+bucket02)/2
返回结果

  "aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 789.0
          },
          "move_avg_view" : {
            "value" : null
          }
        },
        {
          "key_as_string" : "2018-11-01T00:00:00.000Z",
          "key" : 1541030400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1299.0
          },
          "move_avg_view" : {
            "value" : 789.0
          }
        },
        {
          "key_as_string" : "2018-12-01T00:00:00.000Z",
          "key" : 1543622400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1194.0
          },
          "move_avg_view" : {
            "value" : 1044.0
          }
        },
        {
          "key_as_string" : "2019-01-01T00:00:00.000Z",
          "key" : 1546300800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 972.0
          },
          "move_avg_view" : {
            "value" : 1246.5
          }
        },
        {
          "key_as_string" : "2019-02-01T00:00:00.000Z",
          "key" : 1548979200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1584.0
          },
          "move_avg_view" : {
            "value" : 1083.0
          }
        },
        {
          "key_as_string" : "2019-03-01T00:00:00.000Z",
          "key" : 1551398400000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1604.0
          },
          "move_avg_view" : {
            "value" : 1278.0
          }
        },
        {
          "key_as_string" : "2019-04-01T00:00:00.000Z",
          "key" : 1554076800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1392.0
          },
          "move_avg_view" : {
            "value" : 1594.0
          }
        },
        {
          "key_as_string" : "2019-05-01T00:00:00.000Z",
          "key" : 1556668800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1228.0
          },
          "move_avg_view" : {
            "value" : 1498.0
          }
        },
      ]
    }
  }

4. Cumulative Sum Aggregation

parent agg,
累计和聚合——基于父聚合(只能是histogram或date_histogram类型)的某个权值,对权值在每一个桶中求所有之前的桶的该值累计的和。
截止到当前bucket的累计统计值



GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        },
        "cur_sum_view": {
          "cumulative_sum": {
            "buckets_path": "max_view_count"
          }
        }
      }
    }
  }
}

返回

 "aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 789.0
          },
          "cur_sum_view" : {
            "value" : 789.0
          }
        },
        {
          "key_as_string" : "2018-11-01T00:00:00.000Z",
          "key" : 1541030400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1299.0
          },
          "cur_sum_view" : {
            "value" : 2088.0
          }
        }
	...
	...
      ]
    }
  }
}

5. Bucket Script Aggregation: parent agg , 桶脚本聚合——基于父聚合的【一个或多个权值】,对这些权值通过脚本进行运算


返回


6. Bucket Selector Aggregation: parent agg , 对一组bucket执行过滤操作,只有满足过滤条件的bucket会被保留到结果集当中
GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        },
        "select_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "var01": "max_view_count"
            },
            "script": "params.var01>1500"
          }
        }
      }
    }
  }
}


返回

"aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2019-02-01T00:00:00.000Z",
          "key" : 1548979200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1584.0
          }
        },
        {
          "key_as_string" : "2019-03-01T00:00:00.000Z",
          "key" : 1551398400000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1604.0
          }
        },
        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        }
      ]
    }
  }

7. Bucket Sort Aggregation:

parent agg, 对一组bucket进行排序z
使用样例

GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        },
        "select_bucket": {
          "bucket_sort": {
            "sort": [
              {
                "max_view_count": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}


返回

  "aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2019-03-01T00:00:00.000Z",
          "key" : 1551398400000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1604.0
          }
        },
        {
          "key_as_string" : "2019-09-01T00:00:00.000Z",
          "key" : 1567296000000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1587.0
          }
        },
        {
          "key_as_string" : "2019-02-01T00:00:00.000Z",
          "key" : 1548979200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1584.0
          }
        },
        {
          "key_as_string" : "2019-08-01T00:00:00.000Z",
          "key" : 1564617600000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1523.0
          }
        },
        {
          "key_as_string" : "2019-07-01T00:00:00.000Z",
          "key" : 1561939200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1499.0
          }
        },
        {
          "key_as_string" : "2019-06-01T00:00:00.000Z",
          "key" : 1559347200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1423.0
          }
        },
        {
          "key_as_string" : "2019-04-01T00:00:00.000Z",
          "key" : 1554076800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1392.0
          }
        }
      ]
    }
  }

8. Serial Differencing Aggregation: parent agg 串行差分聚合

可以配置的参数

lag:滞后间隔(比如lag=7,表示每次从当前桶的值中减去其前面第7个桶的值)
buckets_path:用于计算均值的权值路径
gap_policy:空桶处理策略(skip/insert_zeros)
format:该聚合的输出格式定义


GET traffic_stats/_search
{
  "size": 0,
  "aggs": {
    "month_term": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "max_view_count": {
          "max": {
            "field": "visits"
          }
        },
        "diff_bucket": {
          "serial_diff": {
            "buckets_path": "max_view_count",
            "lag": 2
          }
        }
      }
    }
  }
}


返回

 "aggregations" : {
    "month_term" : {
      "buckets" : [
        {
          "key_as_string" : "2018-10-01T00:00:00.000Z",
          "key" : 1538352000000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 789.0
          }
        },
        {
          "key_as_string" : "2018-11-01T00:00:00.000Z",
          "key" : 1541030400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1299.0
          }
        },
        {
          "key_as_string" : "2018-12-01T00:00:00.000Z",
          "key" : 1543622400000,
          "doc_count" : 3,
          "max_view_count" : {
            "value" : 1194.0
          },
          "diff_bucket" : {
            "value" : 405.0
          }
        },
        {
          "key_as_string" : "2019-01-01T00:00:00.000Z",
          "key" : 1546300800000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 972.0
          },
          "diff_bucket" : {
            "value" : -327.0
          }
        },
        {
          "key_as_string" : "2019-02-01T00:00:00.000Z",
          "key" : 1548979200000,
          "doc_count" : 2,
          "max_view_count" : {
            "value" : 1584.0
          },
          "diff_bucket" : {
            "value" : 390.0
          }
        },
	...
	...
      ]
    }
  }


可以看到从第3个开始diff_bucket才开始有值,diff_bucket=(第3个bucket的max_view_count)-(第1个bucket的max_view_count)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值