ES入门（二）Elasticsearch之Query DSL：ES增删查改、聚合分析

最新推荐文章于 2024-06-20 13:52:27 发布

买个橘籽

最新推荐文章于 2024-06-20 13:52:27 发布

阅读量1.3k

点赞数

分类专栏： es 文章标签： es elasticsearch java 分布式

本文链接：https://blog.csdn.net/u011485472/article/details/108920301

版权

es 专栏收录该内容

7 篇文章 3 订阅

订阅专栏

ES入门（二）Elasticsearch之Query DSL：ES增删查改、聚合分析

面向文档的搜索分析引擎 vs 面向对象的应用系统

elasticsearch是面向文档的搜索分析引擎。

应用系统的数据结构都是面向对象的，复杂的
对象数据存储到数据库库，只能拆解开来，变为扁平的多张表，每次查询的时候还得还原对象格式，相当麻烦
ES是面向文档的，文档中存储的数据结构，与面向对象的数据结构是一样的，基于这种文档数据结构，es可以提供复杂的索引、全文检索、分析聚合等功能
es的document用json数据格式来表达

简单的Query DSL

（1）快速检查集群的健康状态

GET /_cat/health?v

（2）快速查看集群中有哪些索引

GET /_cat/indices?v

（3）创建索引与删除索引

PUT /my_index001?pretty     创建索引
DELETE /my_index001?pretty  删除索引

（4）商品的crud

方法一：插入数据，自动生成document的 _id

POST /my_index001/_doc
{
  "name":"奶粉",
  "dec":"婴幼儿奶粉",
  "price":270,
  "producer":"奶粉producer",
  "tags":["婴幼儿","二段"]
}

方法二：插入数据，手动指定document的 _id

PUT /my_index001/_doc/1
{
  "name":"奶粉",
  "dec":"中老年奶粉",
  "price":250,
  "producer":"中老年奶粉producer",
  "tags":["中老年","补钙"]
}

查询所有document

查询索引my_index001下的所有数据
GET my_index001/_search
{
  "query": {
    "match_all": {}
  }
}

返回结果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index001",
        "_type" : "_doc",
        "_id" : "fXbW9nQBgKKpTl0QYML0",
        "_score" : 1.0,
        "_source" : {
          "name" : "奶粉",
          "dec" : "婴幼儿奶粉",
          "price" : 270,
          "producer" : "奶粉producer",
          "tags" : [
            "婴幼儿",
            "二段"
          ]
        }
      },
      {
        "_index" : "my_index001",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "奶粉",
          "dec" : "中老年奶粉",
          "price" : 250,
          "producer" : "中老年奶粉producer",
          "tags" : [
            "中老年",
            "补钙"
          ]
        }
      }
    ]
  }
}

返回结果中_id=fXbW9nQBgKKpTl0QYML0的文档是方法一没有指定ID的方式下es帮我们自动生成的document的ID；_id=1的文档是我们手动指定document的ID的情况下插入的数据，如果id已经存在则报错。

返回结果中的信息：

took：耗费来几毫秒
time_out：是否超时，这里没有
_shards：拆分成来几个分片，若有多个分片，对于搜索请求，会打到所有的primary shard（或者它的某个replica shard也可以）
hits.total：查询结果的数量，2个document
hits.max_score：score的含义就是一个document对于一个search的相关度的匹配分数，越相关，就越匹配，分数也高
hits.hits：包含了匹配搜索的document的详细数据

修改商品：更新指定document的指定字段

POST /my_index001/_doc/1/_update
{
  "doc" : {
      "price" : 280
   }
}

替换document：此方法不是更新指定字段，而是将_id=1的document整个替换为下面插入的新的document。注意：若用这个方法更新，需要将原有的不需要更新的字段全部带上，否则会出现字段丢失。

POST /my_index001/_doc/1
{
  "price" : 280
}

删除数据

DELETE /my_index001/_doc/1

bool查询：Query&Filtering与多字符串多字段查询

一个bool查询是一个或者多个查询子句的组合：总共包含4种子句。其中2种会影响算分，2种不影响算分

must 必须匹配。贡献算分
should 选择性匹配。贡献算分
must_not Filter Context：查询子句，必须不能匹配
filter Filter Context：必须匹配，但是不贡献算分

相关性并不只是全文检索的专利。也适用于yes|no的子句，匹配的子句越多，相关性评分越高。如果多条查询子句被合并为一条复合查询语句，比如bool查询，则每个查询字句计算得出的评分会被合并到总的相关性评分中。

bool组合查询

GET /my_index001/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "奶粉"
        }}
      ],
      "must_not": [
        {"match": {
          "name": "牙膏"
        }}
      ],
      "filter": {
        "range": {
          "price": {
            "gte": 200,
            "lte": 300
          }
        }
      },
      "should": [
        {"match": {
          "producer": "producer"
        }}
      ]
    }
  }
}

上面这个bool查询中用到must、must_not、should、filter，这四个可以并行以任意顺序出现，在比的bool查询中，没有must条件，should中必须满足一条查询。

查询语句的结构会对相关度算分产生影响

查询语句
GET /my_index001/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {"tags": "老人"}},
        {"match": {"tags": "婴儿"}},
        {"bool": {
          "should": [
            {"match": {
              "tags": "中年"
            }}
          ]
        }}
      ]
    }
  }
}

返回结果：
"hits" : {
    "total" : 3,
    "max_score" : 2.345461,
    "hits" : [
      {
        "_index" : "my_index001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.345461,
        "_source" : {
          "name" : "奶粉",
          "dec" : "老人奶粉",
          "price" : 240,
          "producer" : "老人奶粉producer",
          "tags" : [
            "老人"
          ]
        }
      },
      {
        "_index" : "my_index001",
        "_type" : "_doc",
        "_id" : "fXbW9nQBgKKpTl0QYML0",
        "_score" : 2.345461,
        "_source" : {
          "name" : "奶粉",
          "dec" : "婴儿奶粉",
          "price" : 270,
          "producer" : "奶粉producer",
          "tags" : [
            "婴儿"
          ]
        }
      },
      {
        "_index" : "my_index001",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.4779618,
        "_source" : {
          "name" : "奶粉",
          "dec" : "中年奶粉",
          "price" : 240,
          "producer" : "中年奶粉producer",
          "tags" : [
            "中年",
            "帮助睡眠"
          ]
        }
      }
    ]
  }

上面的查询中tags="老人"和tags="婴儿"的文档的扽都为2.345461，而tags="中年"的文档得分为1.4779618，是因为前面两个文档查询时条件在同一层级，而最后一个文档查询条件的层级则是在前两个文档的下一层，上面的查询语句可见。

控制字段的boosting，控制关键词搜索结果的得分

查询语句：
GET /my_index001/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
          "tags": {
            "query": "老人", 
            "boost":4
          }
        }},
        {"match": {
          "tags": {
            "query": "婴儿",
            "boost":1
          }
        }}
      ]
    }
  }
}

返回结果：
"hits" : [
      {
        "_index" : "my_index001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 9.381844,
        "_source" : {
          "name" : "奶粉",
          "dec" : "老人奶粉",
          "price" : 240,
          "producer" : "老人奶粉producer",
          "tags" : [
            "老人"
          ]
        }
      },
      {
        "_index" : "my_index001",
        "_type" : "_doc",
        "_id" : "fXbW9nQBgKKpTl0QYML0",
        "_score" : 2.345461,
        "_source" : {
          "name" : "奶粉",
          "dec" : "婴儿奶粉",
          "price" : 270,
          "producer" : "奶粉producer",
          "tags" : [
            "婴儿"
          ]
        }
      }
    ]
  }

boost设置为4的关键词的查询结果得分为9.381844，boost设置为1的关键词的查询结果得分则为2.345461。若两个boost得分都设置为4，则返回结果两个文档的得分都等于9.381844。

短语检索（phrase search）

以上的查询使用的都是全文检索，全文检索会将输入的搜索串拆解开来，去倒排索引里面去一一匹配，只要能匹配上任意一个拆解后的单词，就可以作为结果返回。相反，短语检索要求输入的搜索串必须在指定的字段文本中，完全包含一摸一样的，才可以算匹配，才能所谓结果返回。语法如下：

GET /my_index001/_search
{
  "query": {
    "match_phrase": {
      "dec": "老人奶粉"
    }
  }
}

聚合分析

在使用es的聚合分析功能之前必须将聚合分析的字段的fielddata属性设置为true，否则会报action_request_validation_exception这个异常，设置索引的某个字段的fielddata属性语法如下：

PUT /my_index001/_mapping/_doc
{
  "properties": {
    "name":{
      "type": "text",
      "fielddata": true
    }
  }
}

计算每个name下的商品数量，注意若有中文需自行设置IK分词器，es内置的中文分词器会把中文拆分成一个个单独的文字：

聚合语法：
GET /my_index001/_search
{
  "size": 0,                   //若不加size，会把聚合用到所有的document也一并返回
  "aggs": {
    "group_by_name": {         //聚合返回结果名称
      "terms": {
        "field": "name"        //聚合字段
      }
    }
  }
}

返回结果：
"aggregations" : {
    "group_by_name" : {                        //聚合返回结果名称，与上面相对应
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [                            //聚合结果
        {
          "key" : "奶粉",
          "doc_count" : 3
        },
        {
          "key" : "裤子",
          "doc_count" : 1
        }
      ]
    }
  }

计算name="奶粉"下每个tag下的商品数量：

GET /my_index001/_search
{
  "size": 0, 
  "query": {
    "match": {
      "name": "奶粉"
    }
  }, 
  "aggs": {
    "group_by_tags": {
      "terms": {
        "field": "tags"
      }
    }
  }
}

先分组，再计算每组的平均值，计算每个name下商品的平均值：

GET /my_index001/_search
{
  "size": 0, 
  "aggs": {
    "group_by_name": {
      "terms": {
        "field": "name"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

计算每个name下的price的平均值，并按照price升序排序：

GET /my_index001/_search
{
  "size": 0, 
  "aggs": {
    "group_by_name": {
      "terms": {
        "field": "name",
        "order": {
          "avg_price": "asc"
        }
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

按照指定价格范围区间进行分组，然后在每组内再按照tag进行分组，最后计算每组平均价格：

GET /my_index001/_search
{
  "size": 0, 
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 100
          },
          {
            "from": 100,
            "to": 200
          },
          {
            "from": 200,
            "to": 300
          }
        ]
      },
      "aggs": {
        "group_by_name": {
          "terms": {
            "field": "name"
          },
          "aggs": {
            "average_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

买个橘籽

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
ES入门（二）Elasticsearch之Query DSL：ES增删查改、聚合分析

ES（二）Query DSLelasticsearch的CRUDelasticsearch是面向文档的搜索分析引擎。应用系统的数据结构都是面向对象的，复杂的对象数据存储到数据库库，只能拆解开来，变为扁平的多张表，每次查询的时候还得还原对象格式，相当麻烦 ES是面向文档的，文档中存储的数据结构，与面向对象的数据结构是一样的，基于这种文档数据结构，es可以提供复杂的索引、全文检索、分析聚合等功能 es的document用json数据格式来表达下面对es的query dsl（1）快速检.
复制链接

扫一扫

专栏目录