Elasticsearch 6 入门教程之查询语法(查询详解)

最新推荐文章于 2022-01-11 16:32:43 发布

胡老汉

最新推荐文章于 2022-01-11 16:32:43 发布

阅读量1.2k

点赞数 2

分类专栏： ElasticSearch

本文链接：https://blog.csdn.net/qq_27559331/article/details/103215698

版权

ElasticSearch 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

系列文章

Elasticsearch 6 入门教程之ElasticSearch概述
Elasticsearch 6 入门教程之安装Elasticsearch
Elasticsearch 6 入门教程之ElasticSearch倒排索引和分词器
Elasticsearch 6 入门教程之ElasticSearch API 实现CRUD
Elasticsearch 6 入门教程之什么是Mapping
Elasticsearch 6 入门教程之查询语法(查询详解)

数据准备

PUT /lib3 { 
    "settings":{ 
        "number_of_shards" : 3, 
        "number_of_replicas" : 0
    }, 
    "mappings":{ 
        "user":{ 
            "properties":{ 
                "name": {"type":"text"}, 
                "address": {"type":"text"}, 
                "age": {"type":"integer"}, 
                "interests": {"type":"text"}, 
                "birthday": {"type":"date"} 
            } 
        } 
    } 
}

GET /lib3/user/_search?q=name:lisi

GET /lib3/user/_search?q=name:zhaoliu&sort=age:desc

查询

日期数值类型等需要精确查询(因为没有分词)
字符串text keyword 等可以模糊匹配(分词)

GET _search 查询所有文档
GET /lib/_search 查询lib索引下的所有文档
GET /lib,lib3/_search 查询lib,lib3索引下的所有文档
GET /*3,*4/_search 查询*3,*4 索引下的所有文档 *3,*4 *通配符
GET /lib/user/_search 查询lib下user类型的所有文档
GET /lib,lib4/user,items/_search 查询lib,lib4索引下user,items类型的所有文档
GET /_all/_search 查询集群下所有索引的所有文档
GET /_all/user,items/_search 查询集群下所有索引下的user,items类型的所有文档

查询结果解释

took //查询时间(毫秒)
timed_our //是否超时
_shards： //共请求了多少个shard
total： //查询出的文档总个数
hits： //查询结果，不指定返回数量，默认查询前10个文档
max_score：本次查询中，相关度分数的最大值，文档和此次查询的匹配度越高，_score的值越大，排位越靠前

term查询和terms查询

term query会去倒排索引中寻找确切的term，它并不知道分词器的存在。这种查询适合keyword 、numeric、date。

term:查询某个字段里含有某个关键词的文档
GET /lib3/user/_search/ { 
    "query": { 
        "term": {"interests": "changge"} 
    } 
}
terms:查询某个字段里含有多个关键词的文档
GET /lib3/user/_search { 
    "query":{ 
        "terms":{ 
            "interests": ["hejiu","changge"] 
        } 
    } 
}

from、size控制查询返回的数量

from：从哪一个文档开始

size：需要的个数

类似于mysql中的 limit 0,10
GET /lib3/user/_search { 
    "from":0, 
    "size":10, 
    "query":{ 
        "terms":{ 
            "interests": ["hejiu","changge"] 
        } 
    } 
}

version：返回版本号

GET /lib3/user/_search { 
    "version":true, 
    "query":{ 
        "terms":{ 
            "interests": ["hejiu","changge"] 
        } 
    } 
}

match:查询

match query知道分词器的存在，会对filed进行分词操作，然后再查询
GET /lib3/user/_search { 
    "query":{ 
        "match":{ "name": "zhaoliu" }
     } 
}
GET /lib3/user/_search { 
    "query":{ 
        "match":{ "age": 20 } 
    } 
}

match_all:查询所有文档

GET /lib3/user/_search { 
    "query": { 
        "match_all": {} 
    } 
}

multi_match:可以指定多个字段

GET /lib3/user/_search { 
    "query":{ 
        "multi_match": { 
            "query": "lvyou", 
            "fields": ["interests","name"]  //query搜索"interests","name"两个字段
        } 
    } 
}

match_phrase:短语匹配查询

ElasticSearch引擎首先分析（analyze）查询字符串，从分析后的文本中构建短语查询，这意味着必须匹配短语中的所有分词，并且保证各个分词的相对位置不变：
GET lib3/user/_search { 
    "query":{
        "match_phrase":{
            "interests": "duanlian，shuoxiangsheng" 
        } 
    } 
}

_source:指定返回的字段

GET /lib3/user/_search { 
    "_source": ["address","name"], 
    "query": { 
        "match": { "interests": "changge" } 
    } 
}

控制加载的字段

includes：包含的字段
excludes：排除的字段

GET /lib3/user/_search { 
    "query": { 
        "match_all": {} 
    },
    "_source": {
        "includes": ["name","address"],
        "excludes": ["age","birthday"]
    }
}
//支持使用通配符匹配字段名称
GET /lib3/user/_search { 
    "_source": { 
        "includes": "addr*", 
        "excludes": ["name","bir*"]
    },
    "query": {
        "match_all": {}
    }
}

sort:排序

使用sort实现排序： desc:降序，asc升序

GET /lib3/user/_search { 
    "query": { 
        "match_all": {} 
    }, 
    "sort": [ 
        { "age": { "order":"asc" } } 
    ]
}

GET /lib3/user/_search { 
    "query": { 
        "match_all": {} 
    }, 
    "sort": [ 
        { "age": { "order":"desc" }} 
    ]
}

match_phrase_prefix:前缀匹配查询

GET /lib3/user/_search { 
    "query": { 
        "match_phrase_prefix": { 
            "name": { "query": "zhao" } 
        } 
    } 
}

range:范围查询

range:实现范围查询

参数：from,to,include_lower,include_upper,boost

from :开始的范围
include_lower:是否包含范围的左边界，默认是true
to :结束的范围
include_upper:是否包含范围的右边界，默认是true
boost :设置权重
GET /lib3/user/_search { 
    "query": { 
        "range": { 
            "birthday": { 
                "from": "1990-10-10", 
                "to": "2018-05-01" 
            } 
        } 
    } 
}
GET /lib3/user/_search { 
    "query": { 
        "range": { 
            "age": { 
                "from": 20, 
                "to": 25, 
                "include_lower": true, 
                "include_upper": false 
            } 
        } 
    } 
}

wildcard:查询

允许使用通配符* 和 ?来进行查询

*代表0个或多个字符
？代表任意一个字符
GET /lib3/user/_search { 
    "query": { 
        "wildcard": { "name": "zhao*" } 
    } 
}
GET /lib3/user/_search { 
    "query": { 
        "wildcard": { "name": "li?i" } 
    } 
}

fuzzy:实现模糊查询查询性能略低

value：查询的关键字

boost：查询的权值，默认值是1.0

min_similarity:设置匹配的最小相似度，默认值为0.5，对于字符串，取值为0-1(包括0和1);对于数值，取值可能大于1;对于日期型取值为1d,1m等，1d就代表1天

prefix_length:指明区分词项的共同前缀长度，默认是0

max_expansions:查询中的词项可以扩展的数目，默认可以无限大
GET /lib3/user/_search { 
    "query": { 
        "fuzzy": { 
            "interests": "chagge" 
        } 
    } 
}
GET /lib3/user/_search { 
    "query": { 
        "fuzzy": { 
            "interests": { "value": "chagge" } 
        } 
    } 
}

highlight:高亮搜索结果

GET /lib3/user/_search { 
    "query":{ 
        "match":{ "interests": "changge" } 
    }, 
    "highlight": { 
        "fields": { "interests": {} } 
    } 
}

Filter:查询

filter是不计算相关性的，同时可以cache。因此，filter速度要快于query。

简单的过滤查询

Get /lib4/items/_search { 
    "query":{ 
        "bool":{ 
            "filter":[
                {"term":{"price": 40}}
            ]
        }
    }
}
Get /lib4/items/_search { 
    "query":{ 
        "bool":{ 
            "filter":[
                {"terms":{"price": [25,40]}}  //价格25或者40  不是价格25到40
            ]
        }
    }
}
Get /lib4/items/_search { 
    "query":{ 
        "bool":{ 
            "filter":[
                {"term":{"itemID": ID100123}}
            ]
        }
    }
}
GET /lib4/items/_search { "post_filter": { "term": { "price": 40 } } }
GET /lib4/items/_search { "post_filter": { "terms": { "price": [25,40] } } }
GET /lib4/items/_search { "post_filter": { "term": { "itemID": "ID100123" } } }

ID100123默认会被映射成text类型，默认是分词的

查看分词器分析的结果：

GET /lib4/_mapping

不希望商品id字段被分词，则重新创建映射

DELETE lib4

PUT /lib4 { "mappings": { "items": { "properties": { "itemID": { "type": "text", "index": false } } } } }

bool:过滤查询

可以实现组合过滤查询

格式：
{ "bool": { "must": [], "should": [], "must_not": [] } }

must:必须满足的条件---and
should：可以满足也可以不满足的条件--or
must_not:不需要满足的条件--not

GET /lib4/items/_search { 
    "post_filter": { 
        "bool": { 
            "should": [ 
                {"term": {"price":25}}, 
                {"term": {"itemID": "id100123"}}
            ],
            "must_not": { "term":{"price": 30}}                   
        }
     }
}

嵌套使用bool：

GET /lib4/items/_search { 
	"post_filter": { 
		"bool": { 
			"should": [ 
				{"term": {"itemID": "id100123"}}, 
				{"bool": { 
					"must": [ 
						{"term": {"itemID": "id100124"}}, 
						{"term": {"price": 40}} 
					] 
				}} 
			] 
		} 
	} 
}

gt、lt、gte、lte：范围过滤

gt: > 大于
lt: < 小于
gte: >= 大于等于
lte: <= 小于等于
GET /lib4/items/_search { 
    "post_filter": { 
        "range": { 
            "price": { "gt": 25, "lt": 50 } 
        } 
    }
}

exists:过滤非空

GET /lib4/items/_search { 
	"query": { 
		"bool": { 
			"filter": { "exists":{ "field":"price" } }
		} 
	} 
} 

GET /lib4/items/_search { 
	"query" : { 
		"constant_score" : { 
			"filter": { "exists" : { "field" : "price" } } 
		} 
	} 
}

过滤器缓存

ElasticSearch提供了一种特殊的缓存，即过滤器缓存（filter cache），用来存储过滤器的结果，
被缓存的过滤器并不需要消耗过多的内存（因为它们只存储了哪些文档能与过滤器相匹配的相关信息），
而且可供后续所有与之相关的查询重复使用，从而极大地提高了查询性能。

注意：ElasticSearch并不是默认缓存所有过滤器，以下过滤器默认不缓存：

   numeric_range
   script
   geo_bbox
   geo_distance
   geo_distance_range
   geo_polygon
   geo_shape
   and
   or
   not

exists,missing,range,term,terms默认是开启缓存的
开启方式：在filter查询语句后边加上 "_catch":true

post_filter

post_filter出现在聚合章节，描述post_filter的作用为：只过滤搜索结果，不过滤聚合结果；
如果只做查询不做聚合，post_filter的作用和我们常用的filter是类似的，但由于post_filter是在查询之后才会执行，
所以post_filter不具备filter对查询带来的好处(忽略评分、缓存等)，因此，在普通的查询中不要用post_filter来替代filter；

聚合查询 sum、min、max、avg、cardinality、terms

1)sum

GET /lib4/items/_search { 
    "size":0, 
    "aggs": { 
        "price_of_sum": { 
            "sum": { "field": "price" }
        } 
    } 
}

2)min

GET /lib4/items/_search { 
    "size": 0, 
    "aggs": { 
        "price_of_min": { 
            "min": { "field": "price" } 
        } 
    }
}

3)max

GET /lib4/items/_search { 
    "size": 0, 
    "aggs": { 
        "price_of_max": { 
            "max": { "field": "price" } 
        } 
    } 
}

4)avg

GET /lib4/items/_search { 
    "size":0, 
    "aggs": { 
        "price_of_avg": { 
            "avg": { "field": "price" } 
        } 
    } 
}

5)cardinality:求基数互不相同的值个数

GET /lib4/items/_search { 
    "size":0, 
    "aggs": { 
        "price_of_cardi": { 
            "cardinality": { "field": "price" } 
        } 
    } 
}

6)terms:分组

GET /lib4/items/_search { 
    "size":0, 
    "aggs": { 
        "price_group_by": { 
            "terms": { "field": "price" } 
        } 
    } 
}

对那些有唱歌兴趣的用户按年龄分组

GET /lib3/user/_search { 
    "query": { 
        "match": { "interests": "changge" } 
    }, 
    "size": 0, 
    "aggs":{ 
        "age_group_by":{ 
            "terms": { 
                "field": "age", 
                "order": { "avg_of_age": "desc" } 
            }, 
            "aggs": { 
                "avg_of_age": { 
                    "avg": { "field": "age" } 
                } 
            } 
        } 
    } 
}

复合查询

将多个基本查询组合成单一查询的查询
使用bool查询
接收以下参数：

must：文档必须匹配这些条件才能被包含进来。 ----and
must_not：文档必须不匹配这些条件才能被包含进来。----not
should：如果满足这些语句中的任意语句，将增加 _score，----or

否则，无任何影响。它们主要用于修正每个文档的相关性得分。

filter：必须匹配，但它以不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。

相关性得分是如何组合的。每一个子查询都独自地计算文档的相关性得分。一旦他们的得分被计算出来，
bool 查询就将这些得分进行合并并且返回一个代表整个布尔操作的得分。

下面的查询用于查找 title 字段匹配 how to make millions 并且不被标识为 spam 的文档。
那些被标识为 starred 或在2014之后的文档，将比另外那些文档拥有更高的排名。如果两者都满足，那么它排名将更高：
{ 
    "bool": { 
        "must": { 
            "match": { "title": "how to make millions" }
        }, 
        "must_not": { 
            "match": { "tag": "spam" }
        }, 
        "should": [ 
            { "match": { "tag": "starred" }}, 
            { "range": { "date": { "gte": "2014-01-01" }}} 
        ] 
    } 
}
如果没有 must 语句，那么至少需要能够匹配其中的一条 should 语句。但，如果存在至少一条 must 语句，则对 should 语句的匹配没有要求。
如果我们不想因为文档的时间而影响得分，可以用 filter 语句来重写前面的例子：
{ 
    "bool": { 
        "must": { 
            "match": { "title": "how to make millions" }
        }, 
        "must_not": { 
            "match": { "tag": "spam" }
        }, 
        "should": [ 
            { "match": { "tag": "starred" }} 
        ], 
        "filter": { 
            "range": { "date": { "gte": "2014-01-01" }} 
        } 
    } 
}
通过将 range 查询移到 filter 语句中，我们将它转成不评分的查询，将不再影响文档的相关性排名。由于它现在是一个不评分的查询，
可以使用各种对 filter 查询有效的优化手段来提升性能。

bool 查询本身也可以被用做不评分的查询。简单地将它放置到 filter 语句中并在内部构建布尔逻辑：
{ 
    "bool": { 
        "must": { 
            "match": { "title": "how to make millions" }
        }, 
        "must_not": {     
            "match": { "tag": "spam" }
        }, 
        "should": [ 
            { "match": { "tag": "starred" }} 
        ], 
        "filter": { 
            "bool": { 
                "must": [ { 
                    "range": {
                        "date": { "gte": "2014-01-01" }
                    } 
                }, 
                { 
                    "range": { 
                        "price": { "lte": 29.99 }
                    }
                } ], 
                "must_not": [ { 
                    "term": { "category": "ebooks" }
                } ] 
            }
        } 
    } 
}

constant_score:查询

它将一个不变的常量评分应用于所有匹配的文档。它被经常用于你只需要执行一个 filter 而没有其它查询（例如，评分查询）的情况下。
{ 

    "constant_score": { 
        "filter": { 
            "term": { "category": "ebooks" } 
        } 
    } 
}
term 查询被放置在 constant_score 中，转成不评分的filter。这种方式可以用来取代只有 filter 语句的 bool 查询。