文章目录
Query DSL (Domain Specific Language)
一. 前置数据
DELETE product
PUT /product/_doc/1
{
"name" : "xiaomi phone",
"desc" : "shouji zhong de zhandouji",
"date": "2021-06-01",
"price" : 3999,
"tags": [ "xingjiabi", "fashao", "buka" ]
}
PUT /product/_doc/2
{
"name" : "xiaomi nfc phone",
"desc" : "zhichi quangongneng nfc,shouji zhong de jianjiji",
"date": "2021-06-02",
"price" : 4999,
"tags": [ "xingjiabi", "fashao", "gongjiaoka" ]
}
PUT /product/_doc/3
{
"name" : "nfc phone",
"desc" : "shouji zhong de hongzhaji",
"date": "2021-06-03",
"price" : 2999,
"tags": [ "xingjiabi", "fashao", "menjinka" ]
}
PUT /product/_doc/4
{
"name" : "xiaomi erji",
"desc" : "erji zhong de huangmenji",
"date": "2021-04-15",
"price" : 999,
"tags": [ "low", "bufangshui", "yinzhicha" ]
}
PUT /product/_doc/5
{
"name" : "hongmi erji",
"desc" : "erji zhong de kendeji 2021-06-01",
"date": "2021-04-16",
"price" : 399,
"tags": [ "lowbee", "xuhangduan", "zhiliangx" ]
}
二. query
使用query关键字进行检索,倾向于相关度搜索,故需要计算评分。搜索是Elasticsearch最关键和重要的部分。
1. 查询所有
GET /product/_search
GET copy_to/_search
{
"query": {
"match_all": {}
}
}
2.带参数查询
GET product/_search?q=partlist.name:adapter
GET product/_search?q=name:xiaomi
3.分页
from:第几条开始
size:展示的数目大小
sort:排序
GET product/_search?from=0&size=5&sort=price:asc
4.精准匹配
# 日期
GET /product/_search?q=date:2021-06-01
5._all搜索 (所有有索引的字段中检索)
DELETE product
# 验证_all搜索
PUT product
{
“mappings”: {
“properties”: {
“desc”: {
“type”: “text”,
“index”: false
}
}
}
}
# 先初始化数据
POST /product/_update/5
{
“doc”: {
“desc”: “erji zhong de kendeji 2021-06-01”
}
}
三. _score
概念:相关度评分用于对搜索结果排序,评分越高则认为其结果和搜索的预期值相关度越高,即越 符合搜索预期值。在7.x之前相关度评分默认使用TF/IDF算法计算而来,7.x之后默认为 BM25。在 核心知识篇不必关心相关评分的具体原理,只需知晓其概念即可。
排序:相关度评分为搜索结果的排序依据,默认情况下评分越高,则结果越靠前。
四. _source
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/mapping-source-field.html
禁用_source:
好处:节省存储开销
坏处:
- 不支持update、update_by_query和reindex API。
- 不支持高亮。
- 不支持reindex、更改mapping分析器和版本升级。
- 通过查看索引时使用的原始文档来调试查询或聚合的功能。
- 将来有可能自动修复索引损坏。
总结:如果只是为了节省磁盘,可以压缩索引比禁用_source更好。
数据源过滤器:
Including:结果中返回哪些field
Excluding:结果中不要返回哪些field,不返回的field不代表不能通过该字段进行检索,因为元数据不存在不代表索引不存在
在mapping中定义过滤:支持通配符,但是这种方式不推荐,因为mapping不可变
常用过滤规则
- “_source”: “false”,
- “_source”: “obj.*”,
- “_source”: [ “obj1.*”, “obj2.*” ],
- “_source”: {
“includes”: [ “obj1.*”, “obj2.*” ],
“excludes”: [ “*.description” ]
}
# source 回显的include 和 exclue
DELETE product2
PUT product2
{
"mappings": {
"_source": {
"includes": [
"name",
"price"
],
"excludes": [
"desc",
"tags"
]
}
}
}
PUT product2/_doc/1
{
"owner": {
"name": "zhangsan",
"sex": "男",
"age": 18
},
"name": "hongmi erji",
"desc": "erji zhong dekendeji",
"price": 399,
"tags": [
"lowbee",
"xuhangduan",
"zhiliangx"
]
}
GET product2/_search
# source 返回指定
DELETE product2
PUT product2
{
"mappings": {
"_source": ["owner.name", "owner.sex"],
"query":{
"match_all": {}
}
}
}
# 不查询数据
GET product/_search
{
"_source": false,
"query": {
"match_all": {}
}
}
五. match 全文检索-Fulltext query
# multi_match 根据指定字段查询对应的分词
GET product/_search
{
"query": {
"multi_match": {
"query": "phone huangmenji",
"fields": ["name", "desc"]
}
}
}
# match_all
GET product/_search
{
"query": {
"match_all": {}
}
}
# math 分词查询
GET product/_search
{
"query": {
"match": {
"name": "xiaomi phone"
}
}
}
# math_phrase 段落匹配
GET product/_search
{
"query": {
"match_phrase": {
"name": "nfc phone"
}
}
}
六. Term
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-term-query.html
# term 精确匹配
GET product/_search
{
"query": {
"term": {
"name": "xiaomi phone"
}
}
}
# term 和 match_phrase
GET product/_search
{
"query": {
"match_phrase": {
"name": "xiaomi phone"
}
}
}
# term
GET product/_search
{
"query": {
"term": {
"name": {
"value": "xiaomi phone"
}
}
}
}
# term和keyword区别
GET product/_mapping
GET product/_search
{
"query": {
"term": {
"name": "xiaomi phone"
}
}
}
GET product/_search
{
"query": {
"term": {
"name.keyword": "xiaomi phone"
}
}
}
# terms
GET product/_search
{
"query": {
"terms": {
"tags": ["xingjiabi","buka"],
"boost": 1.2
}
}
}
match
和term
区别
term
和match_phrase
区别:
match_phrase
会将检索关键词分词,match_phrase
的分词结果必须在被检索字段的分词中都包含,而且顺序必须相同,而且默认必须都是连续的
term
搜索不会将搜索词分词
term
和keyword
区别
term
是对于搜索词不分词,
keyword
是字段类型,是对于source data中的字段值不分词
七. Range
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-range-query.html
# range
GET /product/_search?sort=price:desc
# [3999, 4999]
GET /_search
{
"query": {
"range": {
"price": {
"gte": 3999,
"lte": 4999
}
}
}
}
# (3000, 4000)
GET /_search
{
"query": {
"range": {
"price": {
"gt": 3000,
"lt": 4000
}
}
}
}
# [2021-06-01, 2021-06-02]
GET product/_search
{
"query": {
"range": {
"date": {
"gte": "2021-06-01",
"lte": "2021-06-02"
}
}
}
}
# [前一天, 今天]
GET product/_search
{
"query": {
"range": {
"date": {
"gte": "now-1d/d",
"lte": "now/d"
}
}
}
}
GET product/_search
{
"query": {
"range": {
"date": {
"time_zone": "+08:00",
"gte": "2021-06-01",
"lte": "2021-06-02"
}
}
}
}
八. Filter
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-filter-context.html
filter: 不需要计算相关度分数,不需要按照相关度分数进行排序,同时还有内置的自动cache最常使用的filter的数据,性能好
query:要计算相关度分数,按照分数进行排序,而且无法cache结果
GET product/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"name": "phone"
}
},
"boost": 1.2
}
}
}
九. Boolean查询
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html
bool
查询是最常用的组合查询,根据子查询的规划,只有当满足其所有的子查询条件时,Elasitcsearch
引擎回显结果
bool
支持的子查询
- must:必须满足子句(查询)必须出现在匹配的文档中,并将有助于得分。
- filter:过滤器 不计算相关度分数,cache☆子句(查询)必须出现在匹配的文档中。但是不像 must查询的分数将被忽略。Filter子句在filter上下文中执行,这意味着计分被忽略,并且子句被考虑用于缓存。
- should:可能满足 or子句(查询)应出现在匹配的文档中。
- must_not:必须不满足 不计算相关度分数 not子句(查询)不得出现在匹配的文档中。子句在过滤器上下文中执行,这意味着计分被忽略,并且子句被视为用于缓存。由于忽略计分,0因此将返回所有文档的分数。
👻数据准备
PUT xiongchumo/doc/1
{
"name":"熊大",
"age":20,
"from": "树林",
"desc": "反应灵敏,伸手敏捷",
"tags": ["灵敏", "敏捷"]
}
PUT xiongchumo/doc/2
{
"name":"熊二",
"age":19,
"from":"树林",
"desc":"娇憨可爱,吃货",
"tags":["可爱", "吃"]
}
PUT xiongchumo/doc/3
{
"name":"吉吉国王",
"age":18,
"from":"森林",
"desc":"看见香蕉走不动道,时不时头脑灵敏,但大多数是憨憨",
"tags":["香蕉", "憨"]
}
PUT xiongchumo/doc/4
{
"name":"光头强",
"age": 32,
"from":"房子",
"desc":"砍树赚钱,地中海,被老板熊",
"tags":["砍树", "光头", "挨熊"]
}
🚩must
等同于sql
xxx = xxx
# must
GET xiongchumo/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "森林"
}
}
]
}
}
}
GET xiongchumo/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "森林"
}
},
{
"multi_match": {
"query": "香蕉",
"fields": ["tags"]
}
}
]
}
}
}
🚩should
等同于sql
xxx = xxx or yyy = xxx
# should
GET xiongchumo/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"from": "森林"
}
},
{
"match": {
"from": "房子"
}
}
]
}
}
}
🚩must_not
必须不满足 不计算相关度分数 not子句(查询)不得出现在匹配的文档中。子句在过滤器上下文中执行,这意味着计分被忽略,并且子句被视为用于缓存。由于忽略计分,0因此将返回所有文档的分数。
等同于sql
xxx not in ()
# must_not
GET xiongchumo/_search
# 熊二做了分词,所以熊大也被过滤了
GET xiongchumo/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"from": "房子"
}
},
{
"match": {
"name": "熊二"
}
}
]
}
}
}
🚩filter
过滤器 不计算相关度分数,cache☆子句(查询)必须出现在匹配的文档中。但是不像 must查询的分数将被忽略。Filter子句在filter上下文中执行,这意味着计分被忽略,并且子句被考虑用于缓存。
range等同于sql
xxx >= xxx and xxx <= xxx
# filter
GET xiongchumo/_search?sort=age:asc
{
"query": {
"bool": {
"filter": [
{
"range": {
"age": {
"gte": 18,
"lte": 20
}
}
}
]
}
}
}
🚩 minimum_should_match
minimum_should_match:参数指定should返回的文档必须匹配的子句的数量或百分比。如果bool查询包含至少一个should子句,而没有must或 filter子句,则默认值为1。否则,默认值为0
🔓小结
must
:与关系,相当于关系型数据库中的and
。should
:或关系,相当于关系型数据库中的or
。must_not
:非关系,相当于关系型数据库中的not
。filter
:过滤条件。range
:条件筛选范围。gt
:大于,相当于关系型数据库中的>
。gte
:大于等于,相当于关系型数据库中的>=
。lt
:小于,相当于关系型数据库中的<
。lte
:小于等于,相当于关系型数据库中的<=
。- 可任意搭配组合bool中支持的子查询