Elasticsearch 入门教程之基本操作查询（三）

最新推荐文章于 2024-08-07 03:06:21 发布

尼古拉斯大树

最新推荐文章于 2024-08-07 03:06:21 发布

阅读量904

点赞数

分类专栏：中间件文章标签： elasticsearch es 搜索引擎 elk

本文链接：https://blog.csdn.net/weixin_42109071/article/details/120562362

版权

中间件专栏收录该内容

10 篇文章 2 订阅

订阅专栏

2.3 match_phrase不拆分匹配

2.4 字段.keyword 全匹配

2.5 match、match_phrase和keyword检索区别

2.6 multi_match多字段匹配

2.7 bool/must复合查询

2.9 query/filter【结果过滤】

2.10 query/term 匹配某个属性的值

2.11 aggs/agg1（聚合）

1.search检索文档

ES支持两种基本方式检索；

通过REST request uri 发送搜索参数（uri +检索参数）；
通过REST request body 来发送它们（uri+请求体）；

GET bank/_search?q=*&sort=account_number:desc

请求参数方式检索
GET bank/_search?q=*&sort=account_number:asc
说明：
q=* # 查询所有
sort # 排序字段
asc #升序

等价于
GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": {
        "order": "desc"
      }
    }
  ]
}


检索bank下所有信息，包括type和docs
GET bank/_search

查询的字段描述

{
	"took": 6,            #查询花费了多长时间，单位:毫秒
	"timed_out": false,   #搜索是否超时
	"_shards": {          #搜索分片信息
		"total": 1,       #搜索分片总数
		"successful": 1,  #搜索成功的分片数量
		"skipped": 0,     #没有搜索的分片，跳过的分片
		"failed": 0       #搜索失败的分片数量
	},
	"hits": {             #搜索结果集。项目中，我们需要的一切数据都是从hits中获取
		"total": {        #返回多少条数据
			"value": 1000,
			"relation": "eq"
		},
		"max_score": 1.0,
		"hits": [{       #默认查询前10条数据，根据分值降序排序
			"_index": "bank",   #索引库名称
			"_type": "account", #类型名称
			"_id": "1",         #该条数据的id    
			"_score": 1.0,      #关键字与该条数据的匹配度分值
			"_source": {        #索引库中类型，返回结果字段，不指定的话，默认全部显示出来
				"account_number": 1,
				"balance": 39225,
				"firstname": "Amber",
				"lastname": "Duke",
				"age": 32,
				"gender": "M",
				"address": "880 Holmes Lane",
				"employer": "Pyrami",
				"email": "amberduke@pyrami.com",
				"city": "Brogan",
				"state": "IL"
			}
		}]
	}
}

2 DSL领域特定语言

Elasticsearch提供了一个可以执行查询的Json风格的DSL(domain-specific language领域特定语言)。这个被称为Query DSL，该查询语言非常全面。

2.1 from 返回部分字段

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "_source": ["account_number","balance"]
}

返回结果：

2.2 match匹配查询

如果是非字符串，会进行精确匹配。如果是字符串，会进行全文检索

2.2.1 非字符串匹配

查询age=21的记录，因为age是非字符串类型，所以可以实现精确查询

GET /bank/_search
{
  "query": {
    "match": {
      "age": "21"
    }
  }
}

2.2.2 字符串，全文检索

全文检索，最终会按照评分进行排序，会对检索条件进行分词匹配（注意：检索的时候文本不区分大小写）。

GET /bank/_search
{
  "query": {
    "match": {
      "address": "mill road"
    }
  }
}

`2.3 match_phrase不拆分匹配`

将需要匹配的值当成一整个单词（不分词）进行检索

match_phrase：不拆分字符串进行检索（注意：不区分大小写，进行单词匹配）
字段.keyword：必须全匹配上才检索成功

GET /bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill road" 就是说不要匹配只有mill或只有road的，要匹配mill road一整个子串
    }
  }
}

2.4 字段.keyword 全匹配

必须全匹配上才检索成功,且区分大小写

GET /bank/_search
{
  "query": {
    "match": {
      "address.keyword": "mill road"
    }
  }
}

GET /bank/_search
{
  "query": {
    "match": {
      "address.keyword": "Mill Road"
    }
  }
}

查询结果：

发现根据字符串.keyword查询部分，没有返回结果

GET /bank/_search
{
  "query": {
    "match": {
      "address.keyword": "990 Mill Road"
    }
  }
}

查询结果：

2.5 match、match_phrase和keyword检索区别

命令	作用	案例
match	文本字段的匹配，检索结果字段的任意值匹配上就可以显示，以得分排序	检索mail road，得出的结果 mail xxx,road xxxx,mai road xxxx,xxx road等
match_phrase	是做短语匹配，只要文本中包含匹配条件，就能匹配到，以得分排序	检索mail road，得出的结果xxxx mail road ,mail rod xxxx,xxxx mai road xxx
属性.keyword	使用keyword，匹配的条件就是要显示字段的全部值，要进行精确匹配	检索mail road，得出的结果mail road

2.6 multi_match多字段匹配

state或者address中包含mill，并且在查询过程中，会对于查询条件进行分词。

GET /bank/_search
{
  "query": {
    "multi_match": {  # 前面的match仅指定了一个字段。
      "query": "mill road",
      "fields": [  # state和address有mill子串  不要求都有
        "state",
        "address"
        ]
    }
  }
}

2.7 bool/must复合查询

复合语句可以合并，任何其他查询语句，包括复合语句。这也就意味着，复合语句之间可以互相嵌套，可以表达非常复杂的逻辑。

must：必须达到must所列举的所有条件
must_not：必须不匹配must_not所列举的所有条件。
should：应该满足should所列举的条件。满足条件最好，不满足也可以，满足得分更高

实例：查询gender=m，并且address=mill的数据

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [ # 必须有这些字段
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match_phrase": {
            "address": "mill"
          }
        }
      ]
    }
  }
}

查询结果：

2.8 query/must_not复合查询

必须不是指定的情况

实例：查询gender=m，并且address=mill的数据，但是age不等于38的

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [ #gender=m，并且address=mill
        {
          "match": {
             "gender": "M"
          }
        },
        {
          "match_phrase": {
            "address": "mill"
          }
        }
      ],
      "must_not": [#age不等于38的
        {
          "match": {
            "age": "38"
          }
        }
      ]
    }
  }
}

查询结果

2.8 query/should复合查询

should：应该达到should列举的条件，如果到达会增加相关文档的评分，并不会改变查询的结果。如果query中只有should且只有一种匹配规则，那么should的条件就会被作为默认匹配条件二区改变查询结果。

实例：匹配lastName应该等于Wallace的数据

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
             "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "age": "38"
          }
        }
      ],
      "should": [
        {
          "match": {
            "lastname": "Tom"
          }
        }
      ]
    }
  }
}

查询结果

2.9 query/filter【结果过滤】

并不是所有的查询都需要产生分数，特别是哪些仅用于filtering过滤的文档。为了不计算分数，elasticsearch会自动检查场景并且优化查询的执行。

不参与评分更快

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "Street"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "balance": {
              "gte": 38000,
              "lte": 40000
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ]
}

这里先是查询所有匹配address=street的文档，然后再根据38000<=balance<=40000进行过滤查询结果，最后进行balance进行降序排序；

filter在使用过程中，并不会计算相关性得分：所以我们看到查询结果是null

查询结果：

2.10 query/term 匹配某个属性的值

和match一样。匹配某个属性的值。

全文检索字段用match，
其他非text字段匹配用term。

不要使用term来进行文本字段查询，es默认存储text值时用分词分析，所以要搜索text值，使用match

GET /bank/_search
{
  "query": {
    "term": {
      "address": {
        "value": "Street"
      }
    }
  }
}

我们发现对字符串的筛选，匹配结果是无，查询结果：

我们再换成match

GET /bank/_search
{
  "query": {
    "match": {
      "address": "Street"
    }
  }
}

我们用match查询字符串的字段，可以匹配到记过，查询结果：

用term查询非字符串

GET /bank/_search
{
  "query": {
    "term": {
      "age": {
        "value": "34"
      }
    }
  }
}

查询结果：

总结：全文检索字段用match，其他非text字段匹配用term。

2.11 aggs/agg1（聚合）

聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于SQL Group by和SQL聚合函数。

在elasticsearch中，执行搜索返回this（命中结果），并且同时返回聚合结果，把以响应中的所有hits（命中结果）分隔开的能力。这是非常强大且有效的，你可以执行查询和多个聚合，并且在一次使用中得到各自的（任何一个的）返回结果，使用一次简洁和简化的API啦避免网络往返。

aggs：执行聚合。聚合语法如下：

"aggs":{ # 聚合
    "aggs_name":{ # 这次聚合的名字，方便展示在结果集中
        "AGG_TYPE":{} # 聚合的类型(avg,term,terms)
     }
}

terms：看值的可能性分布，会合并锁查字段，给出计数即可
avg：看值的分布平均

例：搜索address中包含mill的所有人的年龄分布以及平均年龄和年龄的总和，但不显示这些人的详情

GET /bank/_search
{
  "query": {
    "match": { # 查询出包含Street的
      "address": "Street"
    }
  },
  "aggs": { #基于查询聚合
    "ageAgg": { # 聚合的名字，随便起
      "terms": { # 看值的可能性分布，相当于按照这个字段进行分组
        "field": "age",
        "size": 10
      }
    },
    "ageAvg":{
      "avg": {  # 看age值的平均
        "field": "age"
      }
    },
    "ageSum":{
      "sum": {  # 看age值的总和
        "field": "age"
      }
    }
  },
  "size": 0  # 不看详情
}


上面可以理解成mysql的按照age继续分组，并求取平均值和总和

select age,count(*),sum(age),avg（age）

from bank

group by age

查询结果

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.14/security-minimal-setup.html to enable security.
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 385,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : { // 第一个聚合的结果
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 168,
      "buckets" : [
        {
          "key" : 26, # age为26的有25条
          "doc_count" : 25
        },
        {
          "key" : 35,
          "doc_count" : 24
        },
        {
          "key" : 33,
          "doc_count" : 23
        },
        {
          "key" : 23,
          "doc_count" : 22
        },
        {
          "key" : 31,
          "doc_count" : 22
        },
        {
          "key" : 32,
          "doc_count" : 22
        },
        {
          "key" : 39,
          "doc_count" : 21
        },
        {
          "key" : 28,
          "doc_count" : 20
        },
        {
          "key" : 22,
          "doc_count" : 19
        },
        {
          "key" : 36,
          "doc_count" : 19
        }
      ]
    },
    "ageAvg" : { // 第二个聚合的结果
      "value" : 30.194805194805195 # age字段的平均值是30
    },
    "ageSum" : { // 第三个聚合的结果
      "value" : 11625.0 # age字段的求和值是11625
    }
  }
}

例：按照年龄聚合，并且求这些年龄段的这些人的平均薪资（即按照年龄进行分组，并求取平均工资）

GET /bank/_search
{
  "query": {
    "match": {
      "address": "street"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age"
      },
      "aggs": {
        "avgBal": {
           "avg": {
             "field": "balance"
           }
        }
      }
    }
  },
  "size": 0
}

查询结果：

复杂子聚合：查出所有年龄分布，并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAggs": {
      "terms": { #  看age分布,age分组
        "field": "age"
      },
      "aggs": {  # 子聚合
        "genderAggs": {
          "terms": {  # 看gender分布，即gender分组
            "field": "gender.keyword" # 注意这里，文本字段应该用.keyword
          },
          "aggs": {# 子聚合
            "ageGenderBalanceAvg":{ 
              "avg": {  # 每个性别的平均
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg":{
          "avg": {  #age分布的平均（男女）
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

查询结果：

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.14/security-minimal-setup.html to enable security.
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAggs" : {  #第一次聚合结果
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 463,
      "buckets" : [
        {
          "key" : 31,  #年龄是31
          "doc_count" : 61, #年龄31有61
          "genderAggs" : { #第二次集合记过
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "M", #性别是M
                "doc_count" : 35,  #性别是M有35
                "ageGenderBalanceAvg" : {  #性别是M,年龄是31的平均值
                  "value" : 29565.628571428573
                }
              },
              {
                "key" : "F", #性别是F
                "doc_count" : 26,  #性别是M有26
                "ageGenderBalanceAvg" : { #性别是F,年龄是31的平均值
                  "value" : 26626.576923076922
                }
              }
            ]
          },
          "ageBalanceAvg" : { #年龄是31的平均值，不区分性别
            "value" : 28312.918032786885
          }
        },
      
        {
          "key" : 34,
          "doc_count" : 49,
          "genderAggs" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "F",
                "doc_count" : 30,
                "ageGenderBalanceAvg" : {
                  "value" : 26039.166666666668
                }
              },
              {
                "key" : "M",
                "doc_count" : 19,
                "ageGenderBalanceAvg" : {
                  "value" : 28027.0
                }
              }
            ]
          },
          "ageBalanceAvg" : { //年龄段的平局值
            "value" : 26809.95918367347
          }
        }
      ]
    }
  }
}

尼古拉斯大树

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch 入门教程之基本操作查询（三）

1.search检索文档ES支持两种基本方式检索；通过REST request uri 发送搜索参数（uri +检索参数）；通过REST request body 来发送它们（uri+请求体）；GET bank/_search?q=*&sort=account_number:desc请求参数方式检索GET bank/_search?q=*&sort=account_number:asc说明：q=* # 查询所有sort # 排序字段asc #升序等价
复制链接

扫一扫

专栏目录