es--基础--10--es服务API查询

最新推荐文章于 2024-02-08 10:43:19 发布

勤径苦舟

最新推荐文章于 2024-02-08 10:43:19 发布

阅读量578

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/zhou920786312/article/details/131524404

版权

elasticsearch 专栏收录该内容

16 篇文章 4 订阅

订阅专栏

es–基础–10–es服务API查询

1、介绍

参考资料

https://www.knowledgedict.com/tutorial/elasticsearch-query.html

1.1、查询语句分类

1.1.1、全文查询

match query
match_phrase query
match_phrase_prefix query
multi_match query
query_string query 
simple_query_string

1.1.2、词项查询

term query
terms query
range query
exists query
prefix query
wildcard query
regexp query
fuzzy query
type query
0ids query

1.1.3、复合查询

bool query
boosting query
constant_score query
dis_max query
function_score query
indices query

1.1.4、嵌套查询

nested query
has_child query
has_parent query

1.1.5、地理位置查询

geo_distance query
geo_bounding_box query
geo_polygon query
geo_shape query

1.1.6、特殊查询

more_like_this query
script query
percolate query

1.2、查询机制分为两种

1.2.1、机制1

根据用户输入的查询词，通过排序模型计算文档与查询词之间的相关度，并根据评分从高到低排序返回；

1.2.2、机制2

过滤机制，只根据过滤条件对文档进行过滤，不计算评分，速度相对较快。

2、全文查询

全文查询主要用在文本字段上
主要考虑查询语句与文档的相关性Relevance。

2.1、match query

用于搜索单个字段
对查询语句进行分词，分词后查询语句的任何一个词项被匹配，文档就会被搜到，默认情况下相当于对分词后词项进行 or 匹配操作。

GET data_record/_search
{
  "query": {
    "match": {
      "fieldTitle": {
        "query": "力量 全国"
      }
    }
  }
}

等同于 or 匹配操作，如下：

GET data_record/_search
{
  "query": {
    "match": {
      "fieldTitle": {
        "query": "力量 全国",
        "operator": "or"
      }
    }
  }
}

如果想查询匹配所有关键词的文档，可以用 and 操作符连接，如下：

GET data_record/_search
{
  "query": {
    "match": {
      "fieldTitle": {
        "query": "力量 全国",
        "operator": "and"
      }
    }
  }
}

2.2、match_phrase query

首先会把 query 内容分词，分词器可以自定义，同时文档还要满足以下两个条件才会被搜索到
1. 分词后所有词项都要出现在该字段中(相当于 and 操作)。
2. 字段中的词项顺序要一致。

例如，有以下 3 个文档，使用 match_phrase 查询"what a wonderful life"，只有前两个文档会被匹配：

		
PUT data_record/test_tp/1
{ "desc": "what a wonderful life" }

PUT data_record/test_tp/2
{ "desc": "what a life"}


PUT data_record/test_tp/3
{ "desc": "life is what"}


GET data_record/test_tp/_search
{
  "query": {
    "match_phrase": {
      "desc": "what life"
    }
  }
}

2.3、match_phrase_prefix query

和 match_phrase 类似，只不过 match_phrase_prefix 支持最后一个 term 的前缀匹配。

GET test_idx/test_tp/_search
{
  "query": {
    "match_phrase_prefix": {
      "desc": "what li"
    }
  }
}

2.4、multi_match query

multi_match 是 match 的升级，用于搜索多个字段。

查询语句为"java 编程"，查询域为 title 和 description，查询语句如下：

GET books/_search
{
  "query": {
    "multi_match": {
      "query": "java 编程",
      "fields": ["title", "description"]
    }
  }
}

multi_match 支持对要搜索的字段的名称使用通配符，示例如下：

GET books/_search
{
  "query": {
    "multi_match": {
      "query": "java 编程",
      "fields": ["title", "*_name"]
    }
  }
}

multi_match 支持用指数符号指定搜索字段的权重。指定关键词出现在 title 中的权重是出现在 description 字段中的 3 倍，命令如下：

GET books/_search
{
  "query": {
    "multi_match": {
      "query": "java 编程",
      "fields": ["title^3", "description"]
    }
  }
}

2.5、query_string query

query_string query 是与 Lucene 查询语句的语法结合非常紧密的一种查询，允许在一个查询语句中使用多个特殊条件关键字(如：AND | OR | NOT)对多个字段进行查询，建议熟悉 Lucene 查询语法的用户去使用。

2.6、simple_query_string

simple_query_string 是一种适合直接暴露给用户，并且具有非常完善的查询语法的查询语句，接受 Lucene 查询语法，解析过程中发生错误不会抛出异常。例子如下：

GET books/_search
{
  "query": {
    "simple_query_string": {
      "query": "\"fried eggs\" +(eggplant | potato) -frittata",
      "analyzer": "snowball",
      "fields": ["body^5", "_all"],
      "default_operator": "and"
    }
  }
}

3、词项查询

全文查询在执行查询之前会分析查询字符串，词项查询时对倒排索引中存储的词项进行精确匹配操作。
词项级别的查询通常用于结构化数据，如数字、日期和枚举类型。

3.1、term query

term 查询用来查找指定字段中包含给定单词的文档，term 查询不被解析，只有查询词和文档中的词精确匹配才会被搜索到，应用场景为查询人名、地名等需要精准匹配的需求。
避免 term 查询对 text 字段使用查询。
1. 默认情况下，Elasticsearch 针对 text 字段的值进行解析分词，这会使查找 text 字段值的精确匹配变得困难
2. 要搜索 text 字段值，需改用 match 查询。

查询 title 字段中含有关键词"思想"的书籍，查询命令如下：

GET books/_search
{
  "query": {
    "term": {
      "title": "思想"
    }
  }
}

3.2、terms query

terms 查询是 term 查询的升级，可以用来查询文档中包含多个词的文档。
比如，想查询 title 字段中包含关键词 “java” 或 “python” 的文档，构造查询语句如下：

{
  "query": {
    "terms": {
      "title": ["java", "python"]
    }
  }
}

3.3、range query

range query 即范围查询，用于匹配在某一范围内的数值型、日期类型或者字符串型字段的文档
1. 比如搜索哪些书籍的价格在 50 到 100之间
2. 比如搜索哪些书籍的出版时间在 2015 年到 2019 年之间。
使用 range 查询只能查询一个字段，不能作用在多个字段上。
range 查询支持的参数有以下几种：
1. gt 大于，
2. gte 大于等于
3. lt 小于
4. lte 小于等于

例如，想要查询价格大于 50，小于等于 70 的书籍，即 50 < price <= 70，构造查询语句如下：

GET bookes/_search
{
  "query": {
    "range": {
      "price": {
        "gt": 50,
        "lte": 70
      }
    }
  }
}

查询出版日期在 2015 年 1 月 1 日和 2019 年 12 月 31 之间的书籍，对 publish_time 字段进行范围查询，命令如下：

{
  "query": {
    "range": {
      "publish_time": {
        "gte": "2015-01-01",
        "lte": "2019-12-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

如果publish_time是字符串类型，这样查询时查不到结果的，需要在字段上加上.keyword

{
  "query": {
    "range": {
      "publish_time.keyword": {
        "gte": "2015-01-01",
        "lte": "2019-12-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

3.4、exists query

exists 查询会返回字段中至少有一个非空值的文档。

3.4.1、举例说明如下：

{
  "query": {
    "exists": {
      "field": "user"
    }
  }
}

3.4.1.1、以下文档会匹配上面的查询

{ "user" : "jane" } 有 user 字段，且不为空。
{ "user" : "" } 有 user 字段，值为空字符串。
{ "user" : "-" } 有 user 字段，值不为空。
{ "user" : [ "jane" ] } 有 user 字段，值不为空。
{ "user" : [ "jane", null ] } 有 user 字段，至少一个值不为空即可。

3.4.1.2、下面的文档都不会被匹配

{ "user" : null } 虽然有 user 字段，但是值为空。
{ "user" : [] } 虽然有 user 字段，但是值为空。
{ "user" : [null] } 虽然有 user 字段，但是值为空。
{ "foo" : "bar" } 没有 user 字段。

3.5、prefix query

prefix 查询用于查询某个字段中以给定前缀开始的文档，比如查询 title 中含有以 java 为前缀的关键词的文档，那么含有 java、javascript、javaee 等所有以 java 开头关键词的文档都会被匹配。

查询 description 字段中包含有以 win 为前缀的关键词的文档，查询语句如下：

GET books/_search
{
  "query": {
    "prefix": {
      "description": "win"
    }
  }
}

如果查询不到，可以在description 字段加上.keyword

GET books/_search
{
  "query": {
    "prefix.keyword": {
      "description": "win"
    }
  }
}

3.6、wildcard query

通配符查询，支持单字符通配符和多字符通配符
1. ? 用来匹配一个任意字符
2. - 用来匹配零个或者多个字符。
和 prefix 查询一样，wildcard 查询的查询性能也不是很高，需要消耗较多的 CPU 资源。
举例：
1. 以 H?tland 为例，Hatland、Hbtland 等都可以匹配，但是不能匹配 Htland，? 只能代表一位。Htland 可以匹配 Htland、Habctland 等，可以代表 0 至多个字符。

下面举一个 wildcard 查询的例子，假设需要找某一作者写的书，但是忘记了作者名字的全称，只记住了前两个字，那么就可以使用通配符查询，查询语句如下：

GET books/_search
{
  "query": {
    "wildcard": {
      "author": "李永*"
    }
  }
}

如果查询不到，可以在 author 字段加上.keyword

GET books/_search
{
  "query": {
    "wildcard": {
      "author.keyword": "李永*"
    }
  }
}

3.7、regexp query

正则表达式查询
通过 regexp query 可以查询指定字段包含与指定正则表达式匹配的文档。
1. “a.c.e” 和 “ab…” 都可以匹配 “abcde”
2. a{3}b{3}、a{2,3}b{2,4}、a{2,}{2,} 都可以匹配字符串 “aaabbb”。

例如需要匹配以 W 开头紧跟着数字的邮政编码，使用正则表达式查询构造查询语句如下：

GET books/_search
{
  "query": {
    "regexp": {
      "postcode": "W[0-9].+"
    }
  }
}

如果查询不到，可以在 postcode 字段加上.keyword

GET books/_search
{
  "query": {
    "regexp": {
      "postcode.keyword": "W[0-9].+"
    }
  }
}

3.8、fuzzy query

编辑距离又称 Levenshtein 距离，是指两个字串之间，由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符，插入一个字符，删除一个字符。
fuzzy 查询就是通过计算词项与文档的编辑距离来得到结果的，但是使用 fuzzy 查询需要消耗的资源比较大，查询效率不高。
适用于需要模糊查询的场景。

举例如下，用户在输入查询关键词时不小心把 “javascript” 拼成 “javascritp”，在存在拼写错误的情况下使用模糊查询仍然可以搜索到含有 “javascript” 的文档，查询语句如下：

GET books/_search
{
  "query": {
    "fuzzy": {
      "title": "javascript"
    }
  }
}

3.9、type query

type query 用于查询具有指定类型的文档。例如查询 Elasticsearch 中 type 为 computer 的文档，查询语句如下：

GET books/_search
{
  "query": {
    "type": {
      "value": "computer"
    }
  }
}

3.10、ids query

ids query 用于查询具有指定 id 的文档。
类型是可选的，也可以省略，也可以接受一个数组。如果未指定任何类型，则会查询索引中的所有类型。

例如，查询类型为 computer，id 为 1、3、5 的文档，本质上是对文档 _id 的查询，所以对应的 value 是字符串类型，查询语句如下：

GET books/_search
{
  "query": {
    "ids": {
      "type": "computer",
      "values": ["1", "3", "5"]
    }
  }
}

es 查询中如果要排除一些指定的 id 列表可以结合 ids query 和 bool 查询的 must_not

4、复合查询

复合查询就是把一些简单查询组合在一起实现更复杂的查询需求
复合查询还可以控制另外一个查询的行为

4.1、bool query

bool 查询可以把任意多个简单查询组合在一起，使用 must、should、must_not、filter 选项来表示简单查询之间的逻辑，每个选项都可以出现 0 次到多次
控制精度
1. 这里需要强调的是，must 语句必须匹配，但有多少 should 语句应该匹配呢？默认情况下，没有 should 语句是必须匹配的，但只有一个例外，那就是当没有 must 语句的时候，至少有一个 should 语句必须匹配。
2. 此外我们可以通过 minimum_should_match 参数控制需要匹配的 should 语句的数量
计算规则
1. bool 查询采用"匹配越多越好（more_matches_is_better）"的机制，因此满足 must 和 should 子句的文档将会合并起来计算分值。
2. 在 filter 子句查询中，分值将会都返回 0。
3. must_not 子句并不影响得分，它们存在的意义是排除已经被包含的文档。
4. 如上所述，bool 查询的计算得分主要是 must 和 should 子句，它们的计算规则是，把所有符合 must 和 should 的子句得分加起来，然后乘以匹配子句的数量，再除以子句的总数。

4.1.1、操作选项

must：文档必须匹配 must 选项下的查询条件，相当于逻辑运算的 AND，且参与文档相关度的评分。
should：文档可以匹配 should 选项下的查询条件也可以不匹配，相当于逻辑运算的 OR，且参与文档相关度的评分。
must_not：与 must 相反，匹配该选项下的查询条件的文档不会被返回；需要注意的是，must_not 语句不会影响评分，它的作用只是将不相关的文档排除。
filter：和 must 一样，匹配 filter 选项下的查询条件的文档才会被返回，但是 filter 不评分，只起到过滤功能，与 must_not 相反。

4.1.2、案例

假设要查询 title 中包含关键词 java，并且 price 不能高于 70，description 可以包含也可以不包含虚拟机的书籍，构造 bool 查询语句如下

GET books/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "status": 1
        }
      },
      "must_not": {
        "range": {
          "price": {
            "gte": 70
          }
        }
      },
      "must": {
        "match": {
          "title": "java"
        }
      },
      "should": [
        {
          "match": {
            "description": "虚拟机"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

4.2、boosting query

boosting 查询用于需要对两个查询的评分进行调整的场景，boosting 查询会把两个查询封装在一起并降低其中一个查询的评分。
boosting 查询包括 positive、negative 和 negative_boost 三个部分
1. positive 中的查询评分保持不变
2. negative 中的查询会降低文档评分
3. negative_boost 指明 negative 中降低的权值

如果我们想对 2015 年之前出版的书降低评分，可以构造一个 boosting 查询，查询语句如下：

GET books/_search
{
	"query": {
		"boosting": {
			"positive": {
				"match": {
					"title": "python"
				}
			},
			"negative": {
				"range": {
					"publish_time": {
						"lte": "2015-01-01"
					}
				}
			},
			"negative_boost": 0.2
		}
	}
}

boosting 查询中指定了抑制因子为 0.2，publish_time 的值在 2015-01-01 之后的文档得分不变，publish_time 的值在 2015-01-01 之前的文档得分为原得分的 0.2 倍。

4.3、constant_score query

constant_score query 包装一个 filter query，并返回匹配过滤器查询条件的文档，且它们的相关性评分都等于 boost 参数值

下面的查询语句会返回 title 字段中含有关键词 elasticsearch 的文档，所有文档的评分都是 1.8：

GET books/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "title": "elasticsearch"
        }
      },
      "boost": 1.8
    }
  }
}

4.4、dis_max query

dis_max query 与 bool query 有一定联系也有一定区别，dis_max query 支持多并发查询，可返回与任意查询条件子句匹配的任何文档类型。
与 bool 查询可以将所有匹配查询的分数相结合使用的方式不同，dis_max 查询只使用最佳匹配查询条件的分数。

请看下面的例子：

GET books/_search
{
	"query": {
		"dis_max": {
			"tie_breaker": 0.7,
			"boost": 1.2,
			"queries": [{
					"term": {
						"age": 34
					}
				},
				{
					"term": {
						"age": 35
					}
				}
			]
		}
	}
}

4.5、function_score query

可以修改查询的文档得分
这个查询在有些情况下非常有用，比如通过评分函数计算文档得分代价较高，可以改用过滤器加自定义评分函数的方式来取代传统的评分方式。
使用 function_score query，用户需要定义一个查询和一至多个评分函数，评分函数会对查询到的每个文档分别计算得分。

4.5.1、案例1

下面这条查询语句会返回 books 索引中的所有文档，文档的最大得分为 5，每个文档的得分随机生成，权重的计算模式为相乘模式。

GET books/_search
{
  "query": {
    "function_score": {
      "query": {
        "match all": {}
      },
      "boost": "5",
      "random_score": {},
      "boost_mode": "multiply"
    }
  }
}

4.5.2、案例2

使用脚本自定义评分公式，这里把 price 值的十分之一开方作为每个文档的得分，查询语句如下：

GET books/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "java"
        }
      },
      "script_score": {
        "inline": "Math.sqrt(doc['price'].value/10)"
      }
    }
  }
}

4.6、indices query

适用于需要在多个索引之间进行查询的场景，它允许指定一个索引名字列表和内部查询。
indices query 中有 query 和 no_match_query 两部分
1. query 中用于搜索指定索引列表中的文档
2. no_match_query 中的查询条件用于搜索指定索引列表之外的文档。

下面的查询语句实现了搜索索引 books、books2 中 title 字段包含关键字 javascript，其他索引中 title 字段包含 basketball 的文档，查询语句如下：

GET books/_search
{
	"query": {
		"indices": {
			"indices": ["books", "books2"],
			"query": {
				"match": {
					"title": "javascript"
				}
			},
			"no_match_query": {
				"term": {
					"title": "basketball"
				}
			}
		}
	}
}

5、嵌套查询

在 Elasticsearch 这样的分布式系统中执行全 SQL 风格的连接查询代价昂贵，是不可行的。相应地，为了实现水平规模地扩展，Elasticsearch 提供了以下两种形式的 join：

nested query(嵌套查询)
has_child query(有子查询)和 has_parent query(有父查询)

5.1、nested query(嵌套查询)

文档中可能包含嵌套类型的字段，这些字段用来索引一些数组对象，每个对象都可以作为一条独立的文档被查询出来(用嵌套查询)。

PUT /my_index
{
	"mappings": {
		"type1": {
			"properties": {
				"obj1": {
					"type": "nested"
				}
			}
		}
	}
}

5.2、has_child query

父子关系可以存在单个的索引的两个类型的文档之间，has_child 查询将返回其子文档能满足特定查询的父文档
文档的父子关系创建索引时在映射中声明

这里以员工(employee)和工作城市(branch)为例，它们属于不同的类型，相当于数据库中的两张表，如果想把员工和他们工作的城市关联起来，需要告诉 Elasticsearch 文档之间的父子关系，这里 employee 是 child type，branch 是 parent type，在映射中声明，执行命令：

PUT /company
{
	"mappings": {
		"branch": {},
		"employee": {
			"parent": { "type": "branch" }
		}
	}
}

使用 bulk api 索引 branch 类型下的文档，命令如下：

POST company/branch/_bulk
{ "index": { "_id": "london" }}
{ "name": "London Westminster","city": "London","country": "UK" }
{ "index": { "_id": "liverpool" }}
{ "name": "Liverpool Central","city": "Liverpool","country": "UK" }
{ "index": { "_id": "paris" }}
{ "name": "Champs Elysees","city": "Paris","country": "France" }

添加员工数据：

POST company/employee/_bulk
{ "index": { "_id": 1,"parent":"london" }}
{ "name": "Alice Smith","dob": "1970-10-24","hobby": "hiking" }
{ "index": { "_id": 2,"parent":"london" }}
{ "name": "Mark Tomas","dob": "1982-05-16","hobby": "diving" }
{ "index": { "_id": 3,"parent":"liverpool" }}
{ "name": "Barry Smith","dob": "1979-04-01","hobby": "hiking" }
{ "index": { "_id": 4,"parent":"paris" }}
{ "name": "Adrien Grand","dob": "1987-05-11","hobby": "horses" }

通过子文档查询父文档要使用 has_child 查询。例如，搜索 1980 年以后出生的员工所在的分支机构，employee 中 1980 年以后出生的有 Mark Thomas 和 Adrien Grand，他们分别在 london 和 paris，执行以下查询命令进行验证：

GET company/branch/_search
{
	"query": {
		"has_child": {
			"type": "employee",
			"query": {
				"range": { "dob": { "gte": "1980-01-01" } }
			}
		}
	}
}

搜索哪些机构中有名为 “Alice Smith” 的员工，因为使用 match 查询，会解析为 “Alice” 和 “Smith”，所以 Alice Smith 和 Barry Smith 所在的机构会被匹配，执行以下查询命令进行验证：

GET company/branch/_search
{
	"query": {
		"has_child": {
			"type": "employee",
			"score_mode": "max",
			"query": {
				"match": { "name": "Alice Smith" }
			}
		}
	}
}

可以使用 min_children 指定子文档的最小个数。例如，搜索最少含有两个 employee 的机构，查询命令如下：

GET company/branch/_search?pretty
{
	"query": {
		"has_child": {
			"type": "employee",
			"min_children": 2,
			"query": {
				"match_all": {}
			}
		}
	}
}

5.3、has_parent query

父子关系可以存在单个的索引的两个类型的文档之间，has_parent 返回其父文档能满足特定查询的子文档。
文档的父子关系创建索引时在映射中声明

通过父文档查询子文档使用 has_parent 查询。比如，搜索工作在 UK的所有employee ，查询命令如下：

GET company/employee/_search
{
	"query": {
		"has_parent": {
			"parent_type": "branch",
			"query": {
				"match": { "country": "UK }
			}
		}
	}
}

6、地理位置查询

支持的类型
1. geo_point：地理位置点
2. geo_shape：地理位置形状

6.1、测试数据

这里准备一些城市的地理坐标作为测试数据，每一条文档都包含城市名称和地理坐标这两个字段。

首先把下面的内容保存到 geo.json 文件中：


{"index":{ "_index":"geo","_type":"city","_id":"1" }}
{"name":"北京","location":"39.9088145109,116.3973999023"}
{"index":{ "_index":"geo","_type":"city","_id": "2" }}
{"name":"乌鲁木齐","location":"43.8266300000,87.6168800000"}
{"index":{ "_index":"geo","_type":"city","_id":"3" }}
{"name":"西安","location":"34.3412700000,108.9398400000"}
{"index":{ "_index":"geo","_type":"city","_id":"4" }}
{"name":"郑州","location":"34.7447157466,113.6587142944"}
{"index":{ "_index":"geo","_type":"city","_id":"5" }}
{"name":"杭州","location":"30.2294080260,120.1492309570"}
{"index":{ "_index":"geo","_type":"city","_id":"6" }}
{"name":"济南","location":"36.6518400000,117.1200900000"}

创建一个索引并设置映射：

PUT geo
{
	"mappings": {
		"city": {
			"properties": {
				"name": {
					"type": "keyword"
				},
				"location": {
					"type": "geo_point"
				}
			}
		}
	}
}

然后执行批量导入命令：

curl -XPOST "http://localhost:9200/_bulk?pretty" --data-binary @geo.json

6.2、geo_distance query

geo_distance query 可以查找在一个中心点指定范围内的地理点文档。

查找距离天津 200km 以内的城市，搜索结果中会返回北京，命令如下：

GET geo/_search
{
	"query": {
		"bool": {
			"must": {
				"match_all": {}
			},
			"filter": {
				"geo_distance": {
					"distance": "200km",
					"location": {
						"lat": 39.0851000000,
						"lon": 117.1993700000
					}
				}
			}
		}
	}
}

按各城市离北京的距离排序：

GET geo/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [{
    "_geo_distance": {
      "location": "39.9088145109,116.3973999023",
      "unit": "km",
      "order": "asc",
      "distance_type": "plane"
    }
  }]
}

location 对应的经纬度字段
unit 为 km 表示将距离以 km 为单位写入到每个返回结果的 sort 键中
distance_type 为 plane 表示使用快速但精度略差的 plane 计算方式。

6.3、geo_bounding_box query

用于查找落入指定的矩形内的地理坐标。

查询中由两个点确定一个矩形，然后在矩形区域内查询匹配的文档。

GET geo/_search
{
	"query": {
		"bool": {
			"must": {
				"match_all": {}
			},
			"filter": {
				"geo_bounding_box": {
					"location": {
						"top_left": {
							"lat": 38.4864400000,
							"lon": 106.2324800000
						},
						"bottom_right": {
							"lat": 28.6820200000,
							"lon": 115.8579400000
						}
					}
				}
			}
		}
	}
}

6.4、geo_polygon query

用于查找在指定多边形内的地理点。

呼和浩特、重庆、上海三地组成一个三角形，查询位置在该三角形区域内的城市，命令如下：

GET geo/_search
{
	"query": {
		"bool": {
			"must": {
				"match_all": {}
			}
		},
		"filter": {
			"geo_polygon": {
				"location": {
					"points": [{
						"lat": 40.8414900000,
						"lon": 111.7519900000
					}, {
						"lat": 29.5647100000,
						"lon": 106.5507300000
					}, {
						"lat": 31.2303700000,
						"lon": 121.4737000000
					}]
				}
			}
		}
	}
}

6.5、geo_shape query

用于查询 geo_shape 类型的地理数据，地理形状之间的关系有相交、包含、不相交三种。

创建一个新的索引用于测试，其中 location 字段的类型设为 geo_shape 类型。

PUT geoshape
{
	"mappings": {
		"city": {
			"properties": {
				"name": {
					"type": "keyword"
				},
				"location": {
					"type": "geo_shape"
				}
			}
		}
	}
}

关于经纬度的顺序注意
1. geo_point 类型的字段，纬度在前经度在后
2. geo_shape 类型中的点，是经度在前纬度在后

把西安和郑州连成的线写入索引：

POST geoshape/city/1
{
	"name": "西安-郑州",
	"location": {
		"type": "linestring",
		"coordinates": [
			[108.9398400000, 34.3412700000],
			[113.6587142944, 34.7447157466]
		]
	}
}

查询包含在由银川和南昌作为对角线上的点组成的矩形的地理形状，由于西安和郑州组成的直线落在该矩形区域内，因此可以被查询到。命令如下：

GET geoshape/_search
{
	"query": {
		"bool": {
			"must": {
				"match_all": {}
			},
			"filter": {
				"geo_shape": {
					"location": {
						"shape": {
							"type": "envelope",
							"coordinates": [
								[106.23248, 38.48644],
								[115.85794, 28.68202]
							]
						},
						"relation": "within"
					}
				}
			}
		}
	}
}

7、特殊查询

7.1、more_like_this query

more_like_this query 可以查询和提供文本类似的文档，通常用于近似文本的推荐等场景。查询命令如下：

GET books/_search
{
	"query": {
		"more_like_ this": {
			"fields": ["title", "description"],
			"like": "java virtual machine",
			"min_term_freq": 1,
			"max_query_terms": 12
		}
	}
}

可选的参数及取值说明如下：

fields：
1. 要匹配的字段
2. 默认是 _all 字段。
like：要匹配的文本。
min_term_freq：
1. 文档中词项的最低频率
2. 默认是 2，低于此频率的文档会被忽略。
max_query_terms query：
1. 中能包含的最大词项数目
2. 默认为 25。
min_doc_freq：
1. 最小的文档频率
2. 默认为 5。
max_doc_freq：最大文档频率。
min_word length：单词的最小长度。
max_word length：单词的最大长度。
stop_words：停用词列表。
analyzer：分词器。
minimum_should_match：
1. 文档应匹配的最小词项数
2. 默认为 query 分词后词项数的 30%。
boost terms：词项的权重。
include：是否把输入文档作为结果返回。
boost：
1. 整个 query 的权重
2. 默认为 1.0。

7.1、script query

Elasticsearch 支持使用脚本进行查询。例如，查询价格大于 180 的文档，命令如下：

GET books/_search
{
  "query": {
    "script": {
      "script": {
        "inline": "doc['price'].value > 180",
        "lang": "painless"
      }
    }
  }
}

7.2、percolate query

一般情况下，我们是先把文档写入到 Elasticsearch 中，通过查询语句对文档进行搜索。percolate query 则是反其道而行之的做法，它会先注册查询条件，根据文档来查询 query。
适用于数据分类、数据路由、事件监控和预警等场景。

例如，在 my-index 索引中有一个 laptop 类型，文档有 price 和 name 两个字段，在映射中声明一个 percolator 类型的 query，命令如下：

PUT my-index
{
	"mappings": {
		"laptop": {
			"properties": {
				"price": { "type": "long" },
				"name": { "type": "text" }
			},
			"queries": {
				"properties": {
					"query": { "type": "percolator" }
				}
			}
		}
	}
}

注册一个 bool query，bool query 中包含一个 range query，要求 price 字段的取值小于等于 10000，并且 name 字段中含有关键词 macbook：

PUT /my-index/queries/1?refresh
{
	"query": {
		"bool": {
			"must": [{
				"range": { "price": { "lte": 10000 } }
			}, {
				"match": { "name": "macbook" }
			}]
		}
	}
}

通过文档查询 query：

GET /my-index/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document_type": "laptop",
      "document": {
        "price": 9999,
        "name": "macbook pro on sale"
      }
    }
  }
}

文档符合 query 中的条件，返回结果中可以查到上文中注册的 bool query。

勤径苦舟

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
es--基础--10--es服务API查询

prefix 查询用于查询某个字段中以给定前缀开始的文档，比如查询 title 中含有以 java 为前缀的关键词的文档，那么含有 java、javascript、javaee 等所有以 java 开头关键词的文档都会被匹配。boosting 查询中指定了抑制因子为 0.2，publish_time 的值在 2015-01-01 之后的文档得分不变，publish_time 的值在 2015-01-01 之前的文档得分为原得分的 0.2 倍。type query 用于查询具有指定类型的文档。
复制链接

扫一扫