elasticsearch的指标聚合和桶聚合

最新推荐文章于 2024-06-25 23:43:49 发布

Chris_Chris_

最新推荐文章于 2024-06-25 23:43:49 发布

阅读量243

点赞数

分类专栏：搜索引擎 # elasticsearch 文章标签： elasticsearch 搜索引擎

本文链接：https://blog.csdn.net/weixin_41029286/article/details/116504790

版权

搜索引擎同时被 2 个专栏收录

18 篇文章 0 订阅

订阅专栏

elasticsearch

18 篇文章 1 订阅

订阅专栏

造数据

# 创建索引库
PUT /book
{
  "settings": {},
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "name": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "price": {
        "type": "float"
      },
      "timestamp": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}
# 插入数据
PUT /book/_doc/1
{
"name": "lucene",
"description": "Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. The PyLucene sub project provides Python bindings for Lucene Core. ",
"price":100.45,
"timestamp":"2020-08-21 19:11:35"
}

PUT /book/_doc/2
{"name": "solr",
"description": "Solr is highly scalable, providing fully fault tolerant distributed indexing, search and analytics. It exposes Lucenes features through easy to use JSON/HTTP interfaces or native clients for Java and other languages.",
"price":320.45,
"timestamp":"2020-07-21 17:11:35"
}

PUT /book/_doc/3
{
"name": "Hadoop",
"description": "The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.",
"price":620.45,
"timestamp":"2020-08-22 19:18:35"
}

PUT /book/_doc/4
{
"name": "ElasticSearch",
"description": "Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力 的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java语言开发的，并作为Apache许可条 款下的开放源码发布，是一种流行的企业级搜索引擎。Elasticsearch用于云计算中，能够达到实时搜 索，稳定，可靠，快速，安装使用方便。官方客户端在Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示，Elasticsearch是最受欢 迎的企业搜索引擎，其次是Apache Solr，也是基于Lucene。",
"price":999.99,
"timestamp":"2020-08-15 10:11:35"
}

指标聚合

对一个数据集求最大、最小、和、平均值等指标的聚合
设置size是为了不展示具体消息，一般情况下size是和from搭配使用来做分页的

使用 max min sum avg

POST /book/_search
{
  "size": 0,
  "aggs": {
    "max_price": {
      "max": {
        "field": "price"
      }
    }
  }
}

POST /book/_search
{
  "size": 0, 
  "aggs": {
    "sum_price": {
      "sum": {
        "field": "price"
      }
    }
  }
}

使用count

统计price大于300的文档数量

POST /book/_count
{
  "query": {
    "range": {
      "price": {
        "gt": 300
      }
    }
  }
}

使用value_count

统计price字段有值的文档数

POST /book/_search?size=0
{
  "aggs": {
    "price_count": {
      "value_count": {
        "field": "price"
      }
    }
  }
}

使用cardinality

cardinality可以去掉重复的值，相当于mysql distinct

POST /book/_search?size=0
{
  "aggs": {
    "_id_count": {
      "cardinality": {
        "field": "_id"
      }
    },
    "price_count": {
      "cardinality": {
        "field": "price"
      }
    }
  }
}

使用stats

统计count max min avg sum

POST /book/_search?size=0
{
  "aggs": {
    "price_stats": {
      "stats": {
        "field": "price"
      }
    }
  }
}

在这里插入图片描述

使用Extended stats

比stats多了平方和、方差、标准差、平均值加/减两个标准差的区间

POST /book/_search?size=0
{
  "aggs": {
    "price_stats": {
      "extended_stats": {
        "field": "price"
      }
    }
  }
}

在这里插入图片描述

Percentiles

占比百分位

POST /book/_search?size=0
{
  "aggs": {
    "price_percents": {
      "percentiles": {
        "field": "price"
      }
    }
  }
}
POST /book/_search?size=0
{
  "aggs": {
    "price_percents": {
      "percentiles": {
        "field": "price",
        "percents": [
          75,
          99,
          99.9
        ]
      }
    }
  }
}

在这里插入图片描述

Percentiles rank

统计值小于等于指定值的文档占比
统计price小于100和200的文档的占比

POST /book/_search?size=0
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "price",
        "values": [
          100,
          200
        ]
      }
    }
  }
}

桶聚合

类似mysql group by ，把满足相关特性的文档分到一个桶里，一个桶是一个group，输出结果可包括多个group
分组然后算桶里面的数据统计

POST /book/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 200
          },
          {
            "from": 200,
            "to": 400
          },
          {
            "from": 400,
            "to": 1000
          }
        ]
      },
      "aggs": {
        "average_price": {
          "avg": {
            "field": "price"
          }
        },
        "count_price": {
          "value_count": {
            "field": "price"
          }
        }
      }
    }
  }
}

在这里插入图片描述

桶聚合过滤

过滤聚合后的结果，类似mysql的having

POST /book/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 200
          },
          {
            "from": 200,
            "to": 400
          },
          {
            "from": 400,
            "to": 1000
          }
        ]
      },
      "aggs": {
        "average_price": {
          "avg": {
            "field": "price"
          }
        },
        "count_price": {
          "value_count": {
            "field": "price"
          }
        },
        "having": {
          "bucket_selector": {
            "buckets_path": {
              "avg_price": "average_price"
            },
            "script": {
              "source": "params.avg_price >= 200 "
            }
          }
        }
      }
    }
  }
}