Elasticsearch Query DSL:查询上下文和过滤上下文
Elasticsearch提供非常完善基于JSON的Query DSL(Domain Specific Language)用于定义查询。主要包括查询上下文即过滤上下文,以及两者组合查询。
1. 查询上下文
在查询上下文中使用的查询子句基于文档相关性原则进行查询,用于回答“文档匹配查询子句的程度”。查询结果列出所有相关文档并按照相关性评分进行排序。相关性评分有查询上下文中的查询子句计算出来,用_score表示,即相对于其他文档的匹配程度。
无论何时将查询参数传给查询子句,查询上下文都有效,如搜索API中的查询参数。下面示例带有查询上下文的查询,返回所有描述包括science单词的课程。
GET /courses/_search
{
"query": {
"match": {
"course_description": "science"
}
}
}
2. 过滤上下文
过滤上下文可视为结果为0/1的二值工具。查询上下文回答“匹配程度”,过滤上下文简单回答“是/否”。
过滤上下文大多数用于过滤结构化数据,如范围查询(给定日期范围)、状态检查等。elasticsearch会自动缓存频繁使用过滤上下文,从而提升查询性能。
无论何时将过滤参数传给查询子句,过滤上下文都有效,如在bool查询中的filter 或者 must_not 参数,constant_score 查询中的过滤参数,或filter聚集。下面带过滤上下文查询子句返回所有学生得分大于等于33的课程文档。
GET /courses/_search
{
"query": {
"bool": {
"filter": {
"range": { "students_enrolled": { "gte": 33 }}
}
}
}
}
注意:查询上下文与过滤上下文的基本差异————查询上下文与_score(相关性评分)关联,而过滤上下文与二值(true、false)关联。
3. 查询示例
本节我们通过几个示例加深理解。为了验证查询结果,这里提供一些查询数据,读者可以批量插入至courses进行测试。
{
"_index" : "courses",
"_type" : "_doc",
"_id" : "7G4TN3ABnUeCEegtv7VW",
"_score" : 1.0,
"_source" : {
"name" : "Marketing 101",
"room" : "E4",
"professor" : {
"name" : "William Smith",
"department" : "finance",
"facutly_type" : "part-time",
"email" : "wills@onuni.com"
},
"students_enrolled" : 18,
"course_publish_date" : "2015-06-21",
"course_description" : "Mkt 101 is a course from the business school on the introduction to marketing that teaches students the fundamentals of market analysis, customer retention and online advertisements"
}
},
{
"_index" : "courses",
"_type" : "_doc",
"_id" : "7W4TN3ABnUeCEegtv7VW",
"_score" : 1.0,
"_source" : {
"name" : "Accounting 101",
"room" : "E3",
"professor" : {
"name" : "Thomas Baszo",
"department" : "finance",
"facutly_type" : "part-time",
"email" : "baszot@onuni.com"
},
"students_enrolled" : 27,
"course_publish_date" : "2015-01-19",
"course_description" : "Act 101 is a course from the business school on the introduction to accounting that teaches students how to read and compose basic financial statements"
}
},
{
"_index" : "courses",
"_type" : "_doc",
"_id" : "7m4TN3ABnUeCEegtv7VW",
"_score" : 1.0,
"_source" : {
"name" : "Tax Accounting 200",
"room" : "E7",
"professor" : {
"name" : "Thomas Baszo",
"department" : "finance",
"facutly_type" : "part-time",
"email" : "baszot@onuni.com"
},
"students_enrolled" : 17,
"course_publish_date" : "2016-06-15",
"course_description" : "Tax Act 200 is an intermediate course covering various aspects of tax law"
}
},
{
"_index" : "courses",
"_type" : "_doc",
"_id" : "724UN3ABnUeCEegtkLUq",
"_score" : 1.0,
"_source" : {
"name" : "Capital Markets 350",
"room" : "E3",
"professor" : {
"name" : "Thomas Baszo",
"department" : "finance",
"facutly_type" : "part-time",
"email" : "baszot@onuni.com"
},
"students_enrolled" : 13,
"course_publish_date" : "2016-01-11",
"course_description" : "This is an advanced course teaching crucial topics related to raising capital and bonds, shares and other long-term equity and debt financial instrucments"
}
}
1、仅有查询上下文
GET /courses/_search
{
"query": {
"match": {
"course_description": "science"
}
}
}
响应信息包括_score表明文档相关性评分。
2、带过滤占位符的查询上下文
使用bool组合多个匹配子句,这里filter参数为空,filter参数表示过滤上下文。
GET /courses/_search
{
"query": {
"bool": {
"must": [
{ "match": { "professor.facutly_type": "part-time" }},
{ "match": { "professor.department": "finance" }}
],
"filter": [
]
}
}
}
must内所有子句必须都匹配,相当于and功能。
3、带过滤的查询上下文
在查询基础上增加过滤条件。范围过滤会在结果上删除符合过滤条件的文档。
GET /courses/_search
{
"query": {
"bool": {
"must": [
{ "match": { "professor.facutly_type": "part-time" }},
{ "match": { "professor.department": "finance" }}
],
"filter": [
{ "range": { "students_enrolled": { "gte": 16 }}}
]
}
}
}
4、使用must_not 子句
must_not 子句从结果中删除符合条件文档。
GET /courses/_search
{
"query": {
"bool": {
"must": [
{ "match": { "professor.facutly_type": "part-time" }},
{ "match": { "professor.department": "finance" }}
],
"must_not": [
{ "match": { "course_description": "business" }}
],
"filter": [
{ "range": { "students_enrolled": { "gte": 16 }}}
]
}
}
}
must_not相当于not功能,表示不匹配。
5、multi_match
多字段匹配:
GET /courses/_search
{
"query": {
"multi_match": {
"query": "computer",
"fields": ["name","professor.department"]
}
}
}
6、multi_phrase
multi_phrase需要完全匹配搜索词组。部分或打断词组将不会匹配。
GET /courses/_search
{
"query": {
"match_phrase": {
"course_description": "computer science introduction teaching"
}
}
}
7、match_phase_prefix
match_phase_prefix 部分以查询词组为前缀查询。
GET /courses/_search
{
"query": {
"match_phrase_prefix": {
"course_description": "computer science"
}
}
}
8、范围子句
gte表示大于或等于,lte表示小于或等于。其他选项gt(大于),lt(小于)。
GET /courses/_search
{
"query": {
"range": {
"students_enrolled": {
"gte": 20,
"lte": 30
}
}
}
}
9、should
Should 子句一般用于查询最相关的文档。如果删除minimum_should_match子句则返回多个文档,反之返回最相关文档。
GET /courses/_search
{
"query": {
"bool": {
"must": [
{"match": {"name":"101"}}
],
"must_not": [
{"match": {"room": "e7"}}
],
"should": [
{
"range": {
"students_enrolled": {
"gte": 10,
"lte": 20
}
}
}
],
"minimum_should_match": 1
}
}
}
should相当于or功能。minimum_should_match紧跟should后面,用于限定必须满足or条件最小量。
4. 总结
我们一起学习了Elasticsearch Query DSL,并通过示例说明查询上下文和过滤上下文以及两者组合使用。