本学习笔记基于ElasticSearch 7.10版本,旧版本已经废弃的查询功能暂时不做笔记,以后有涉及到再做补充。
参考官方文档:
https://www.elastic.co/guide/en/elasticsearch/reference/7.10/geo-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.10/specialized-queries.html
1、地理位置查询
1.1、数据准备
创建一个索引:
PUT geo
{
"mappings": {
"properties": {
"name":{
"type": "keyword"
},
"location":{
"type": "geo_point"
}
}
}
}
准备一个 geo.json 文件,贴上如下文件内容,注意最后一行要留空:
{"index":{"_index":"geo","_id":1}}
{"name":"西安","location":"34.288991865037524,108.9404296875"}
{"index":{"_index":"geo","_id":2}}
{"name":"北京","location":"39.926588421909436,116.43310546875"}
{"index":{"_index":"geo","_id":3}}
{"name":"上海","location":"31.240985378021307,121.53076171875"}
{"index":{"_index":"geo","_id":4}}
{"name":"天津","location":"39.13006024213511,117.20214843749999"}
{"index":{"_index":"geo","_id":5}}
{"name":"杭州","location":"30.259067203213018,120.21240234375001"}
{"index":{"_index":"geo","_id":6}}
{"name":"武汉","location":"30.581179257386985,114.3017578125"}
{"index":{"_index":"geo","_id":7}}
{"name":"合肥","location":"31.840232667909365,117.20214843749999"}
{"index":{"_index":"geo","_id":8}}
{"name":"重庆","location":"29.592565403314087,106.5673828125"}
最后执行批量导入命令:
curl -XPOST "http://localhost:9200/geo/_bulk?pretty" -H "content-type:application/json" --data-binary @geo.json
可能用到的工具网站:http://geojson.io/#map=6/32.741/116.521
1.2、geo_distance
geo_distance
查询:给出一个中心点和距离,查询以该中心点为圆心,以距离为半径范围内的文档:
GET geo/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [
{
"geo_distance": {
"distance": "600km",
"location": {
"lat": 34.288991865037524,
"lon": 108.9404296875
}
}
}
]
}
}
}
1.3、geo_bounding_box
geo_bounding
分别制定左上和右下两个点,查询两个点组成的矩形内所有文档:
GET geo/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [
{
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 32.0639555946604,
"lon": 118.78967285156249
},
"bottom_right": {
"lat": 29.98824461550903,
"lon": 122.20642089843749
}
}
}
}
]
}
}
}
1.4、geo_polygon
geo_polygon
最少制定三个点,查询组成的多边形范围内所有文档:
GET geo/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [
{
"geo_polygon": {
"location": {
"points": [
{
"lat": 31.793755581217674,
"lon": 113.8238525390625
},
{
"lat": 30.007273923504556,
"lon":114.224853515625
},
{
"lat": 30.007273923504556,
"lon":114.8345947265625
}
]
}
}
}
]
}
}
}
1.5、geo_shape
geo_shape
用来查询图形,针对 geo_shape
字段类型,两个图形之间的关系有:相交、包含、不相交。
首先新建索引:
PUT geo_shape
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"location": {
"type": "geo_shape"
}
}
}
}
添加一条线:
PUT geo_shape/_doc/1
{
"name": "西安-郑州",
"location": {
"type": "linestring",
"coordinates": [
[108.9404296875, 34.279914398549934],
[113.66455078125, 34.768691457552706]
]
}
}
接下来查询某一个图形中是否包含该线:
GET geo_shape/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [
{
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates": [
[
106.5234375,
36.80928470205937
],
[
115.33447265625,
32.24997445586331
]
]
},
"relation": "within"
}
}
}
]
}
}
}
relation
属性表示两个图形的关系:
within
包含intersects
相交disjoint
不相交
2、特殊查询
2.1、more_like_this
more_like_this
查询可以实现基于内容的推荐,给定一篇文章,可以查询出和该文章相似的内容。
more_like_this
的一些参数说明:
fields
:要匹配的字段,可以有多个。like
:要匹配的文本。min_term_freq
:词项的最低频率,默认是 2。注意: 是指like
中文本的词项。max_query_terms
:like
中文本的词项被分词后的最大数量。min_doc_freq
:最小的文档频率。like
中文本的词项,至少要在多少个文档中出现的树木,少于则词项被忽略。max_doc_freq
:最大的文档频率。analyzer
:分词器,默认使用字段的分词器。stop_words
:停用词列表。minmum_should_match
:文档中最少匹配的词项数目,少于则文档被忽略。
2.2、script
脚本查询,例如查询所有价格大于 200 的图书:
GET books/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"lang": "painless",
"source": "if(doc['price'].size()!=0){doc['price'].value > 200}"
}
}
}
]
}
}
}
2.3、percolate
percolate
表示为渗透查询或者反向查询,作用于 percolate
类型的字段,该字段不会被分词。
- 正常查询: 根据查询语句找到对应的文档,属于 query >> document 的过程。
- percolate查询: 根据文档,返回与之匹配的查询语句,属于 document >> query 的过程。
应用场景:
- 价格监控
- 库存报警
- 股票警告
- …
例如有个阈值告警场景,当指定字段值大于阈值 10,报警提示:
定义索引:
PUT log
{
"mappings": {
"properties": {
"threshold":{
"type": "long"
},
"count":{
"type": "long"
},
"query":{
"type":"percolator"
}
}
}
}
插入文档:
PUT log/_doc/1
{
"threshold":10,
"query":{
"bool":{
"must":{
"range":{
"count":{
"gt":10
}
}
}
}
}
}
使用 percolator
查询多个指定文档,大于阈值的文档会被报警:
GET log/_search
{
"query": {
"percolate": {
"field": "query",
"documents": [
{
"count":3
},
{
"count":6
},
{
"count":90
},
{
"count":12
},
{
"count":15
}
]
}
}
}
查询结果中的 _percolator_document_slot
字段表示文档的下标,从 0 开始计数。
版权声明:
本文仅记录ElasticSearch学习心得,如有侵权请联系删除。
更多内容请访问原创作者:江南一点雨