Elastic Search — Query DSL

贺鹏123

已于 2022-06-15 19:41:14 修改

阅读量289

点赞数

分类专栏： ElasticStack 文章标签： elasticsearch 搜索引擎 solr

于 2022-06-15 19:38:59 首次发布

本文链接：https://blog.csdn.net/zhangHP_123/article/details/125303150

版权

ElasticStack 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

文章目录

- Query DSL (Domain Specific Language)

Query DSL (Domain Specific Language)

一. 前置数据

DELETE product
PUT /product/_doc/1
{
    "name" : "xiaomi phone",
    "desc" :  "shouji zhong de zhandouji",
    "date": "2021-06-01",
    "price" :  3999,
    "tags": [ "xingjiabi", "fashao", "buka" ]
}
PUT /product/_doc/2
{
    "name" : "xiaomi nfc phone",
    "desc" :  "zhichi quangongneng nfc,shouji zhong de jianjiji",
    "date": "2021-06-02",
    "price" :  4999,
    "tags": [ "xingjiabi", "fashao", "gongjiaoka" ]
}
PUT /product/_doc/3
{
    "name" : "nfc phone",
    "desc" :  "shouji zhong de hongzhaji",
    "date": "2021-06-03",
    "price" :  2999,
    "tags": [ "xingjiabi", "fashao", "menjinka" ]
}
PUT /product/_doc/4
{
    "name" : "xiaomi erji",
    "desc" :  "erji zhong de huangmenji",
    "date": "2021-04-15",
    "price" :  999,
    "tags": [ "low", "bufangshui", "yinzhicha" ]
}
PUT /product/_doc/5
{
    "name" : "hongmi erji",
    "desc" :  "erji zhong de kendeji 2021-06-01",
    "date": "2021-04-16",
    "price" :  399,
    "tags": [ "lowbee", "xuhangduan", "zhiliangx" ]
}

二. query

使用query关键字进行检索，倾向于相关度搜索，故需要计算评分。搜索是Elasticsearch最关键和重要的部分。

1. 查询所有

GET /product/_search

 GET copy_to/_search
 {
	"query": {
		 "match_all": {}
	}
}

2.带参数查询

 GET product/_search?q=partlist.name:adapter
 GET product/_search?q=name:xiaomi

3.分页

from:第几条开始

size:展示的数目大小

sort:排序

GET product/_search?from=0&size=5&sort=price:asc

4.精准匹配

 # 日期
 GET /product/_search?q=date:2021-06-01

5._all搜索（所有有索引的字段中检索）

DELETE product

# 验证_all搜索
PUT product
{
	“mappings”: {
		“properties”: {
			“desc”: {
			“type”: “text”,
			“index”: false
			}
		}
	}
}

# 先初始化数据
POST /product/_update/5
{
“doc”: {
	“desc”: “erji zhong de kendeji 2021-06-01”
	}
}

三. _score

概念：相关度评分用于对搜索结果排序，评分越高则认为其结果和搜索的预期值相关度越高，即越符合搜索预期值。在7.x之前相关度评分默认使用TF/IDF算法计算而来，7.x之后默认为 BM25。在核心知识篇不必关心相关评分的具体原理，只需知晓其概念即可。

排序：相关度评分为搜索结果的排序依据，默认情况下评分越高，则结果越靠前。

四. _source

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/mapping-source-field.html

禁用_source：

好处：节省存储开销

坏处：

不支持update、update_by_query和reindex API。
不支持高亮。
不支持reindex、更改mapping分析器和版本升级。
通过查看索引时使用的原始文档来调试查询或聚合的功能。
将来有可能自动修复索引损坏。

总结：如果只是为了节省磁盘，可以压缩索引比禁用_source更好。

数据源过滤器：

Including：结果中返回哪些field

Excluding：结果中不要返回哪些field，不返回的field不代表不能通过该字段进行检索，因为元数据不存在不代表索引不存在

在mapping中定义过滤：支持通配符，但是这种方式不推荐，因为mapping不可变

常用过滤规则

“_source”: “false”,
“_source”: “obj.*”,
“_source”: [ “obj1.*”, “obj2.*” ],
“_source”: {
“includes”: [ “obj1.*”, “obj2.*” ],
“excludes”: [ “*.description” ]
}

# source 回显的include 和 exclue
DELETE product2
PUT product2
{
  "mappings": {
    "_source": {
      "includes": [
        "name",
        "price"
      ],
      "excludes": [
        "desc",
        "tags"
      ]
    }
  }
}


PUT product2/_doc/1
{
  "owner": {
    "name": "zhangsan",
    "sex": "男",
    "age": 18
  },
  "name": "hongmi erji",
  "desc": "erji zhong dekendeji",
  "price": 399,
  "tags": [
    "lowbee",
    "xuhangduan",
    "zhiliangx"
  ]
}

GET product2/_search

# source 返回指定
DELETE product2

PUT product2
{
  "mappings": {
    "_source": ["owner.name", "owner.sex"],
    "query":{
      "match_all": {}
    }
  }
}

# 不查询数据
GET product/_search
{
  "_source": false,
  "query": {
    "match_all": {}
  }
}

五. match 全文检索-Fulltext query

match_phrase

# multi_match 根据指定字段查询对应的分词
GET product/_search
{
  "query": {
    "multi_match": {
      "query": "phone huangmenji",
      "fields": ["name", "desc"]
    }
  }
}
# match_all
GET product/_search
{
  "query": {
    "match_all": {}
  }
} 
# math 分词查询
GET product/_search
{
  "query": {
    "match": {
      "name": "xiaomi phone"
    }
  }
}

# math_phrase 段落匹配
GET product/_search
{
  "query": {
    "match_phrase": {
      "name": "nfc phone"
    }
  }
}

六. Term

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-term-query.html

# term 精确匹配
GET product/_search
{
  "query": {
    "term": {
      "name": "xiaomi phone"
    }
  }
}

# term 和 match_phrase
GET product/_search
{
  "query": {
    "match_phrase": {
      "name": "xiaomi phone"
    }
  }
}

# term
GET product/_search
{
  "query": {
    "term": {
      "name": {
       "value": "xiaomi phone" 
      }
    }
  }
}

# term和keyword区别
GET product/_mapping
GET product/_search
{
  "query": {
    "term": {
      "name": "xiaomi phone"
    }
  }
}

GET product/_search
{
  "query": {
    "term": {
      "name.keyword": "xiaomi phone"
    }
  }
}

# terms
GET product/_search
{
  "query": {
    "terms": {
      "tags": ["xingjiabi","buka"],
      "boost": 1.2
    }
  }
}

match和term区别

原理解析

term和match_phrase区别:

match_phrase 会将检索关键词分词, match_phrase的分词结果必须在被检索字段的分词中都包含，而且顺序必须相同，而且默认必须都是连续的

term搜索不会将搜索词分词

term和keyword区别

term是对于搜索词不分词,

keyword是字段类型,是对于source data中的字段值不分词

七. Range

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-range-query.html

# range
GET /product/_search?sort=price:desc
# [3999, 4999]
GET /_search
{
  "query": {
    "range": {
      "price": {
        "gte": 3999,
        "lte": 4999
      }
    }
  }
}

# (3000, 4000)
GET /_search
{
  "query": {
    "range": {
      "price": {
        "gt": 3000,
        "lt": 4000
      }
    }
  }
}

# [2021-06-01, 2021-06-02]
GET product/_search
{
  "query": {
    "range": {
      "date": {
        "gte": "2021-06-01",
        "lte": "2021-06-02"
      }
    }
  }
}

# [前一天, 今天]
GET product/_search
{
  "query": {
    "range": {
      "date": {
        "gte": "now-1d/d",
        "lte": "now/d"
      }
    }
  }
}


GET product/_search
{
  "query": {
    "range": {
      "date": {
        "time_zone": "+08:00", 
        "gte": "2021-06-01",
        "lte": "2021-06-02"
      }
    }
  }
}

八. Filter

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-filter-context.html

filter: 不需要计算相关度分数，不需要按照相关度分数进行排序，同时还有内置的自动cache最常使用的filter的数据，性能好

query：要计算相关度分数，按照分数进行排序，而且无法cache结果

GET product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "name": "phone"
        }
      },
      "boost": 1.2
    }
  }
}

九. Boolean查询

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html

bool查询是最常用的组合查询，根据子查询的规划，只有当满足其所有的子查询条件时，Elasitcsearch引擎回显结果

bool支持的子查询

must：必须满足子句（查询）必须出现在匹配的文档中，并将有助于得分。
filter：过滤器不计算相关度分数，cache☆子句（查询）必须出现在匹配的文档中。但是不像 must查询的分数将被忽略。Filter子句在filter上下文中执行，这意味着计分被忽略，并且子句被考虑用于缓存。
should：可能满足 or子句（查询）应出现在匹配的文档中。
must_not：必须不满足不计算相关度分数 not子句（查询）不得出现在匹配的文档中。子句在过滤器上下文中执行，这意味着计分被忽略，并且子句被视为用于缓存。由于忽略计分，0因此将返回所有文档的分数。

👻数据准备

PUT xiongchumo/doc/1
{
  "name":"熊大",
  "age":20,
  "from": "树林",
  "desc": "反应灵敏，伸手敏捷",
  "tags": ["灵敏", "敏捷"]
}

PUT xiongchumo/doc/2
{
  "name":"熊二",
  "age":19,
  "from":"树林",
  "desc":"娇憨可爱，吃货",
  "tags":["可爱", "吃"]
}


PUT xiongchumo/doc/3
{
  "name":"吉吉国王",
  "age":18,
  "from":"森林",
  "desc":"看见香蕉走不动道，时不时头脑灵敏，但大多数是憨憨",
  "tags":["香蕉", "憨"]
}

PUT xiongchumo/doc/4
{
  "name":"光头强",
  "age": 32,
  "from":"房子",
  "desc":"砍树赚钱，地中海，被老板熊",
  "tags":["砍树", "光头", "挨熊"]
}

🚩must

等同于sql

xxx = xxx

# must
GET xiongchumo/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "森林"
          }
        }
      ]
    }
  }
}

GET xiongchumo/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "森林"
          }
        },
        {
          "multi_match": {
            "query": "香蕉",
            "fields": ["tags"]
          }
        }
      ]
    }
  }
}

🚩should

等同于sql

xxx = xxx or yyy = xxx

# should
GET xiongchumo/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "from": "森林"
          }
        },
        {
          "match": {
            "from": "房子"
          }
        }
      ]
    }
  }
}

🚩must_not

必须不满足不计算相关度分数 not子句（查询）不得出现在匹配的文档中。子句在过滤器上下文中执行，这意味着计分被忽略，并且子句被视为用于缓存。由于忽略计分，0因此将返回所有文档的分数。

等同于sql

xxx not in ()

# must_not 
GET xiongchumo/_search
# 熊二做了分词，所以熊大也被过滤了
GET xiongchumo/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "from": "房子"
          }
        },
        {
          "match": {
            "name": "熊二"
          }
        }
      ]
    }
  }
}

🚩filter

过滤器不计算相关度分数，cache☆子句（查询）必须出现在匹配的文档中。但是不像 must查询的分数将被忽略。Filter子句在filter上下文中执行，这意味着计分被忽略，并且子句被考虑用于缓存。

range等同于sql

xxx >= xxx and xxx <= xxx

# filter
GET xiongchumo/_search?sort=age:asc
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "age": {
              "gte": 18,
              "lte": 20
            }
          }
        }
      ]
    }
  }
}