ElasticSearch-Relationships&Geo Queries

最新推荐文章于 2024-07-15 00:25:00 发布

xueshijun666

最新推荐文章于 2024-07-15 00:25:00 发布

阅读量857

点赞数

文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/xueshijun666/article/details/127254145

版权

---Relationships---

Using the has_child query

Using the has_parent query

Using the nested query

----geo-----

Using the geo_bounding_box query

Using the geo_shape query

Using the geo_distance query

---Relationships---



"""
DELETE /mybooks

PUT /mybooks
{
  "mappings": {
    "properties": {
      "join_field": {
        "type": "join",
        "relations": {
          "order": "item"
        }
      },
      "position": {
        "type": "integer",
        "store": true
      },
      "uuid": {
        "store": true,
        "type": "keyword"
      },
      "date": {
        "type": "date"
      },
      "quantity": {
        "type": "integer"
      },
      "price": {
        "type": "double"
      },
      "description": {
        "term_vector": "with_positions_offsets",
        "store": true,
        "type": "text"
      },
      "title": {
        "term_vector": "with_positions_offsets",
        "store": true,
        "type": "text",
        "fielddata": true,
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

POST _bulk?refresh
{"index":{"_index":"mybooks", "_id":"1"}}
{"uuid":"11111","position":1,"title":"Joe Tester","description":"Joe Testere nice guy","date":"2015-10-22","price":4.3,"quantity":50}
{"index":{"_index":"mybooks", "_id":"2"}}
{"uuid":"22222","position":2,"title":"Bill Baloney","description":"Bill Testere nice guy","date":"2016-06-12","price":5,"quantity":34}
{"index":{"_index":"mybooks", "_id":"3"}}
{"uuid":"33333","position":3,"title":"Bill Klingon","description":"Bill is not\n                nice guy","date":"2017-09-21","price":6,"quantity":33}


PUT /mybooks-join
{
  "mappings": {
      "properties": {
        "join": { "type": "join", "relations": { "book": "author"  } },

        "uuid": { "store": true, "type": "keyword" },
        "position": { "type": "integer", "store": true },
        "title": {
          "term_vector": "with_positions_offsets", "store": true, "type": "text",
          "fielddata": true,
          "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
        },
        "description": { "term_vector": "with_positions_offsets", "store": true, "type": "text" },
        "date": { "type": "date" },
        "price": { "type": "double" },
        "quantity": { "type": "integer" },
        "versions": {
          "type": "nested", "properties": {  "color": {  "type": "keyword"  },  "size": { "type": "integer" } }
        },


        "rating": { "type": "double" },
        "name": {  "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
        "surname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }
      }}}

POST _bulk?refresh
{"index":{"_index":"mybooks-join", "_id":"1"}}
{"uuid":"11111","position":1,"title":"Joe Tester","description":"Joe Testere nice guy","date":"2015-10-22","price":4.3,"quantity":50,	
	"join": {"name": "book"}, "versions":[{"color":"yellow", "size":5},{"color":"blue", "size":15}]}
{"index":{"_index":"mybooks-join", "_id":"a11", "routing":"1"}}
{"name":"Peter","surname":"Doyle","rating":4.5, "join": {"name": "author", "parent":"1"}}
{"index":{"_index":"mybooks-join", "_id":"a12", "routing":"1"}}
{"name":"Mark","surname":"Twain","rating":4.2, "join": {"name": "author", "parent":"1"}}

{"index":{"_index":"mybooks-join", "_id":"2"}}
{"uuid":"22222","position":2,"title":"Bill Baloney","description":"Bill Testere nice guy","date":"2016-06-12","price":5,"quantity":34,
	"join": {"name": "book"}, "versions":[{"color":"red", "size":2},{"color":"blue", "size":10}]}
{"index":{"_index":"mybooks-join", "_id":"a2", "routing":"2"}}
{"name":"Agatha","surname":"Princeton","rating":2.1, "join": {"name": "author", "parent":"2"}}

{"index":{"_index":"mybooks-join", "_id":"3"}}
{"uuid":"33333","position":3,"title":"Bill Klingon","description":"Bill is not\n    nice guy","date":"2017-09-21","price":6,"quantity":33,
	"join": {"name": "book"}, "versions":[{"color":"red", "size":2}]}
{"index":{"_index":"mybooks-join", "_id":"a3", "routing":"3"}}
{"name":"Martin","surname":"Twisted","rating":3.2,"join": {"name": "author", "parent":"3"}}

POST /mybooks-join/_refresh

"""

Using the has_child query

1. We want to search the parent (book) of the children (author), which has a term in the name field called martin. We can create this kind of query using the following code:

POST /mybooks-join/_search
{ "query": {
    "has_child": {
      "type": "author",
      "query": { "term": { "name": "martin" } },
      "inner_hits" : {}
    } } }

The parameters that are used to control this process are as follows:
•   The type parameter describes the type of children. This type is part of the same index as the parent; it's the name provided in the join field parameter at index time.
•   The query parameter can be executed for the selection of the children. Any kind of query can be used.
•   If defined, the score_mode parameter (the default is none; available values are max, sum, avg, and none) allows you to aggregate the children's scores with the parent's ones.
•   min_children and max_children are optional parameters. This is the minimum/maximum number of children that are required to match the parent document.
•   ignore_unmapped (false by default), when set to true, will ignore unmapped types. This is very useful when executing a query on multiple indices and some types are missing. The default behavior is to throw an exception if there is a mapping error.

Using the has_parent query

POST /mybooks-join/_search
{ "query": {
    "has_parent": {
      "parent_type": "book",
      "query": { "term": {"description": "bill" }}}}}

Using the nested query

nested objects are indexed in a special way in Elasticsearch.

POST /mybooks-join/_search
{ "query": {
    "nested": {
      "path": "versions",  "score_mode": "avg",
      "query": {
        "bool": {
          "must": [
            { "term": { "versions.color": "blue" } },
            { "range":{ "versions.size": { "gt": 10 }}
            } ] } } } } }

Elasticsearch manages nested objects in a special way. During indexing, they are extracted from the main document and indexed as a separate document, which is saved in the same Lucene chunk of the main document.
The nested query executes the first query on the nested documents, and after gathering the result IDs, they are used to filter the main document. The parameters that are used to control this process are as follows:
•   path: This is the path of the parent document that contains the nested objects.
•   query: This is the query that can be executed to select the nested objects. Every kind of query can be used.
•   score_mode: The default value is avg. The valid values are avg, sum, min, max, and none, which control how to use the score of the nested document matches to improve the query.

----geo-----

"""

DELETE /mygeo-index

PUT /mygeo-index
{
  "mappings": {
    "properties": {
      "pin": {
        "properties": {
          "location": {
            "type": "geo_point"
          }
        }
      }
    }
  }
}

PUT /mygeo-index/_doc/1
{"pin": {"location": {"lat": 40.12, "lon": -71.34}}}

PUT /mygeo-index/_doc/2
{"pin": {"location": {"lat": 40.12, "lon": 71.34}}}

POST /mygeo-index/_refresh
"""

Using the geo_bounding_box query

One of the most common operations in geo-localization is searching for a box (square).
The square is usually an approximation of the shape of a shop, a building, or a city.
This kind of query can be used in a percolator for real-time monitoring if users, documents, or events are entering a special place.

POST /mygeo-index/_search?pretty
{ "query": {
    "geo_bounding_box": {
      "pin.location": {
        "bottom_right": { "lat": 40.03, "lon": 72 },
        "top_left": { "lat": 40.717, "lon": 70.99 }
      } } } }

Elasticsearch has a lot of optimizations to facilitate searching for a box shape. Latitude and longitude are indexed for fast-range checks, so this kind of filter is executed very quickly.
The parameters that are required to execute a geo-bounding box filter are the following:
•   top_left (the top and left coordinates of the box).
•   bottom_right (the bottom and right coordinates of the box) geo points.
•   validation_method (default STRICT) is used for validating the geo point. The valid values are as follows:
o   IGNORE_MALFORMED is used to accept invalid values for latitude and longitude.
o   COERCE is used to try to correct wrong values.
o   STRICT is used to reject invalid values.
•   type (memory by default) if the query should be executed in memory or indexed.

Using the geo_shape query

POST /mygeo-index/_search
{ "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "geo_shape": {
          "pin.location": {
            "shape": {
              "type": "polygon",
              "coordinates": [
                [[-30,50],[-80,30],[-90,80],[-30,50]]
              ] },
            "relation": "within"} } } } } }

Using the geo_distance query

This scenario as the following:
• Finding the nearest restaurant within a distance of 20 km
• Finding my nearest friends within a range of 10 km

GET /mygeo-index/_search
{ "query": {
    "geo_distance": {
      "pin.location": { "lat": 40, "lon": 70 },
      "distance": "200km" } } }

The distance query executes a distance calculation between a given geo point and the points in the documents, returning hits that satisfy the distance requirement.
The parameters that control the distance query are as follows:
•   The field and point of reference used to calculate the distance. In the preceding example, we have pin.location and (40,70).
•   distance defines the distance to be considered. It is usually expressed as a string by a number plus a unit.
•   unit (optional) can be the unit of the distance value, if the distance is defined as a number. The valid values are as follows:
   o   in or inch
   o   yd or yards
   o   m or miles
   o   km or kilometers
   o   m or meters
   o   mm or millimeters
   o   cm or centimeters
•   distance_type (arc by default; valid choices are arc, which considers the roundness of the globe, or plane, which simplifies the distance in a linear way) defines the type of algorithm to calculate the distance.
•   validation_method (STRICT by default) is used for validating the geo point. The valid values are as follows:
   o   IGNORE_MALFORMED is used to accept invalid values for latitude and longitude.
   o   COERCE is used to try to correct wrong values.
   o   STRICT is used to reject invalid values.
•   ignore_unmapped is used to safely execute the query in the case of multi-indices, which can have a missing definition of a geo point.