ES12-词项查询

最新推荐文章于 2024-10-30 13:16:11 发布

powerx_yc

最新推荐文章于 2024-10-30 13:16:11 发布

阅读量173

点赞数

文章标签： python

原文链接：https://my.oschina.net/u/3100849/blog/1858871

版权

2019独角兽企业重金招聘Python工程师标准>>>

1.词项查询介绍

全文查询将在执行之前分析查询字符串，但词项级别查询将按照存储在倒排索引中的词项进行精确操作。这些查询通常用于数字，日期和枚举等结构化数据，而不是全文本字段。或者，它们允许您制作低级查询，并在分析过程之前进行。

2.term查询

term查询用于词项搜索，前一章已经介绍过这里不再重复。

3.terms查询

term查询对于查找单个值非常有用，但通常我们可能想搜索多个值。我们只要用单个 terms 查询（注意末尾的 s ）， terms 查询好比是 term 查询的复数形式（以英语名词的单复数做比）。

如下查询”title“中包含”河北“，”长生“，”碧桂园“三个词组。

GET telegraph/_search
{
  "query": {
    "terms": {
      "title": ["河北","长生","碧桂园"]
    }
  }
}

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "A5etp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "碧桂园集团副主席杨惠妍",
          "content": "杨惠妍分别于7月10日、11日买入碧桂园1000万股、1500万股",
          "author": "小财注",
          "pubdate": "2018-07-17T16:12:55"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "Apetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "长生生物再次跌停 三机构抛售近1000万元",
          "content": "长生生物再次一字跌停，报收19.89元，成交1432万元",
          "author": "长生生物",
          "pubdate": "2018-07-17T10:03:11"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

4. terms_set查询

查找与一个或多个指定词项匹配的文档，其中必须匹配的术语数量取决于指定的最小值，应匹配字段或脚本。

5.range查询

range查询用于匹配数值型、日期型或字符串型字段在某一范围内的文档。

日期类型范围查询

上面例子查询发布时间“pubdate”在“2018-07-17T12:00:00”和“2018-07-17T16:30:00”之间的文档数据。

GET telegraph/_search
{
  "query": {
    "range": {
      "pubdate": {
        "gte": "2018-07-17T12:00:00",
        "lte": "2018-07-17T16:30:00"
      }
    }
  }
}

查询结果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "AZetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "周五召开董事会会议 审议及批准更新后的一季报",
          "content": "以审议及批准更新后的2018年第一季度报告",
          "author": "中兴通讯",
          "pubdate": "2018-07-17T12:33:11"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "A5etp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "碧桂园集团副主席杨惠妍",
          "content": "杨惠妍分别于7月10日、11日买入碧桂园1000万股、1500万股",
          "author": "小财注",
          "pubdate": "2018-07-17T16:12:55"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

数值类型范围查询

新建索引添加数据

DELETE my_person

PUT my_person

PUT my_person/stu/1
{
  "name":"sean",
  "age":20
}

PUT my_person/stu/2
{
  "name":"sum",
  "age":25
}

PUT  my_person/stu/3
{
  "name":"dean",
  "age":30
}

PUT my_person/stu/4
{
  "name":"kastel",
  "age":35
}

查询“age”范围在20到30之间的人员

GET my_person/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 30
      }
    }
  }
}

查询结果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "sum",
          "age": 25
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "age": 20
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "dean",
          "age": 30
        }
      }
    ]
  }
}

6.exists查询

查询文档中的字段至少包含一个非空值。

创建索引添加数据

DELETE my_person

PUT my_person

PUT my_person/stu/1
{
  "name":"sean",
  "hobby":"running"
}

PUT my_person/stu/2
{
  "name":"Jhon",
  "hobby":""
}

PUT my_person/stu/3
{
  "name":"sum",
  "hobby":["swimming",null]
}

PUT my_person/stu/4
{
  "name":"lily",
  "hobby":[null,null]
}

PUT my_person/stu/5
{
  "name":"lucy"
}

查询“hobby”不为空的文档

GET my_person/_search
{
  "query": {
    "exists":{
      "field":"hobby"
    }
  }
}

查询结果

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "Jhon",
          "hobby": ""
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

匹配说明：

"hobby":"running"------值不为空（可以匹配）
"hobby":""------值为空字符串，不是空值（可以匹配）
"hobby":["swimming",null]------数组中有非空值（可以匹配）
"hobby":[null,null]------数组中值都为null（不可以匹配）
"name":"lucy"------没有hobby字段（不可以匹配）

7.prefix查询

查询以匹配字符串开头的文档,如下查询”hobby“中以”sw“开头的文档

GET my_person/_search
{
  "query": {
    "prefix": {
      "hobby": {
        "value": "sw"
      }
    }
  }
}

查询结果

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

8.wildcard查询

通配符查询,如下查询hobby匹配”*ing“的文档

GET my_person/_search
{
  "query": {
    "wildcard": {
      "hobby": {
        "value": "*ing"
      }
    }
  }
}

查询结果

{
  "took": 27,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

9.regexp查询

正则表达式查询的性能很大程度上取决于所选的正则表达式。类似.*的匹配任何内容的正则表达式非常缓慢，并且使用了lookaround正则表达式。如果可以的话，请尝试在正则表达式开始之前使用长前缀。像.*?+这样的通配符匹配器大多会降低性能。大多数正则表达式引擎允许您匹配字符串的任何部分。如果你想让正则表达式模式从字符串的开头开始，或者在字符串的末尾完成，那么你必须明确地定位它，使用^表示开始或$表示结束。

元字符	语义	说明	例子
`.`	Match any character	The period “.” can be used to represent any character 匹配任何一个字符	`ab.`匹配abc、ab1
`+`	One-or-more	The plus sign “+” can be used to repeat the preceding shortest pattern once or more times. 加号“+”可以用来重复上一个最短的模式一次或多次。	“aaabbb”匹配a+b+
`*`	Zero-or-more	The asterisk “*” can be used to match the preceding shortest pattern zero-or-more times.	“aaabbb”匹配ab
`?`	Zero-or-one	The question mark “?” makes the preceding shortest pattern optional. It matches zero or one times.	“aaabbb”匹配aaa?bbbb?
`{m}`,`{m,n}`	Min-to-max	Curly brackets “{}” can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat.	“aaabbb”匹配a{3}b{3}和a{2,4}b{2,4}
`()`	Grouping	Parentheses “()” can be used to form sub-patterns.	“ababab”匹配`(ab)+`
`\|`	Alternation	The pipe symbol “\|” acts as an OR operator.	“aabb”匹配`aabb\|bbaa`
`[]`	Character classes	Ranges of potential characters may be represented as character classes by enclosing them in square brackets “[]”. A leading ^ negates the character class.	[abc]匹配 ‘a’ or ‘b’ or ‘c’
`~`	Complement	The shortest pattern that follows a tilde “~” is negated（否定）.“ab~cd”的意思是：以a开头，后跟b，后面跟一个任意长度的字符串，但不是c，以d结尾	“abcdef”匹配ab~df或a~(cb)def，不匹配ab~cdef和a~(bc)def
`<>`	Interval间隔	The interval option enables the use of numeric ranges, enclosed by angle brackets “<>”.	“foo80”匹配`foo<1-100>`
`&`	Intersection	The ampersand “&” joins two patterns in a way that both of them have to match.	“aaabbb”匹配aaa.+&.+bbb
`@`	Any string	The at sign “@” matches any string in its entirety.	`@&~(foo.+)`匹配除了以“foo”开头的字符串 “foo”

查询”hobby“字段值与”sw.+“正则匹配的文档

GET my_person/_search
{
  "query": {
    "regexp":{
      "hobby":"sw.+"
    }
  }
}

查询结果

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

10.fuzzy查询

模糊查询

GET telegraph/_search
{
  "query": {
    "fuzzy": {
      "title": "十大"
    }
  }
}

查询结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.99277425,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 0.99277425,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

11.ids查询

根据跟定的文档id列表查询文档。

GET my_person/_search
{
  "query": {
    "ids": {
      "values": ["1","3","5"]
    }
  }
}

查询结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "5",
        "_score": 1,
        "_source": {
          "name": "lucy"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

转载于:https://my.oschina.net/u/3100849/blog/1858871