metric aggregation
- 单值分析:只输出一个分析结果
- max,min,avg,sum
- Cardinality 去重,类似与distinct count
- 多值分析:输出多个分析结果
- stats,extended stats
- percentile,percentile rank
- top hits
POST employees/_search
{
"size":0,
"aggs": {
"max_salary": {
"max": {
"field": "salary"
}
},
"min_salary": {
"min": {
"field": "salary"
}
},
"avg_salary": {
"avg": {
"field": "salary"
}
}
}
}
bucket
按照一定的规则,将文档分配到不同的桶中,从而达到分类的目的,ES提供的一些常见的Bucket Aggregation
- Terms
- 数字类型
- Range / Data Range
- Histogram /Date Histogram
支持嵌套,也就是说在桶里在做分桶
terms aggregation
- 字段需要打开fielddata,才能进行Terms Aggregation
- keyword默认支持doc_values
- text需要在mapping中enable,会按照分词后的结果进行分
POST employees/_search
{
"size": 0,
"aggs":{
"jobs":{
"terms":{
"field": "job.keyword"
}
}
}
}
POST employees/_search
{
"size":0,
"aggs":{
"jobs":{
"terms": {
"field": "job"
}
}
}
}
# 对text字段打开fielddata,才支持terms aggregation
PUT /employees/_mapping
{
"properties":{
"jobs":{
"type":"text",
"fielddata":true
}
}
}
对job.keyword和job进行terms聚合,分桶的总数并不一样
POST employees/_search
{
"size":0,
"aggs":{
"cardinate":{
"cardinate": {
"field": "job.keyword"
}
}
}
}
POST employees/_search
{
"size":0,
"aggs":{
"cardinate":{
"cardinate": {
"field": "job"
}
}
}
}
执行上面两个发现结果不一致,使用为,对job进行了分词,cardinate去重后结果不一样
优化terms聚合性能,可以打开eager_global_ordinals,
应用场景:频繁的需要聚合,对性能要求高,不断有新的文档添加
PUT index
{
"mappings": {
"properties": {
"foo":{
"type": "key",
"eager_global_ordinals":true
}
}
}
}
直方图分桶
# 工资0到两万,一5000为一个区间分桶
POST /employees/_search
{
"size": 0,
"aggs":{
"salary_histrogram":{
"histogram": {
"field": "salary",
"interval": 5000,
"extended_bounds": {
"min": 0,
"max": 20000
}
}
}
}
}
嵌套
# 嵌套聚合1 按照工作职位进行分桶,并统计工资信息
POST emplogyees/_search
{
"size": 0,
"aggs":{
"job_salary_stats":{
"terms": {
"field": "job.keyword"
},
"aggs":{
"salary":{
"stats": {
"field": "salary"
}
}
}
}
}
}
POST emplogyees/_search
{
"size": 0,
"aggs":{
"job_salary_stats":{
"terms": {
"field": "job.keyword"
},
"aggs":{
"gender_stats":{
"terms": {
"field": "gender"
}
},
"aggs":{
"salary_stats":{
"stats": {
"field": "salary"
}
}
}
}
}
}
}
Pipeline
- min_bucket :求之前结果中最小的值,通过关键字buckets_path指定路径
# 在员工工种中,找出平均工资最低的工种
POST /employees/_search
{
"size": 0,
"aggs":{
"jobs":{
"terms": {
"field": "job.keyword"
},
"aggs":{
"avg_salary":{
"avg":{
"field": "salary"
}
}
},
"min_salary_by_job":{
"min_bucket":{
"buckets_path":"jobs>avg_salary"
}
}
}
}
}
- avg_bucket 在之前结果中找到平均值
- stats_bucket 统计值
- percentiles_bucket 百分数统计
parent pipeline
- derivative
- cumulative_sum
- moving_fn
聚合的作用范围
ES聚合分析默认的作用范围是query的查询结果集。
ES还支持filter、post filter 、global三种作用范围
聚合分析中的排序
在聚合分析中加入order关键字
#排序 order
#count and key
POST employees/_search
{
"size": 0,
"query": {
"range": {
"age": {
"gte": 20
}
}
},
"aggs": {
"jobs": {
"terms": {
"field":"job.keyword",
"order":[
{"_count":"asc"},
{"_key":"desc"}
]
}
}
}
}