Elasticsearch 常用查询

最新推荐文章于 2024-09-15 01:19:00 发布

chunjiangbi4176

最新推荐文章于 2024-09-15 01:19:00 发布

阅读量131

点赞数

文章标签：大数据 json 数据库

原文链接：https://my.oschina.net/tianyuliang/blog/2209002

版权

常用查询

ES提供了两种搜索的方式：请求参数方式和请求体方式。

请求参数方式：
curl 'localhost:9200/bank/_search?q=*&pretty'    
其中bank的索引名称，q后面跟着搜索的条件：q=*表示查询所有的内容

请求体方式（推荐这种方式）:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} }
}'
这种方式会把查询的内容放入body中，会造成一定的开销，但是易于理解

除此之外，ES支持一种JSON格式的查询，叫做DSL，domain specific language。后面的例子中我们都将使用这种方式。

同时这里只是介绍了一些常用的简单查询。更对查询请参考官方文档。PS：该官方文档虽然是2.X的版本，但是还是可以用以参考而且是中文的，不用那么头疼。

查询所有记录

使用 GET 方法，直接请求/Index/Type（可省略）/_search，就会返回所有记录。

GET user/admin/_search
返回结果：
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "user",
        "_type": "admin",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user": "李斯特",
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

返回结果的字段的含义如下：

took：字段表示该操作的耗时（单位为毫秒）
timed_out：字段表示是否超时
hits字段表示命中的记录，里面子字段的含义如下：
- total：返回记录数，本例是1条。
- max_score：最高的匹配分数，本例是1.0。
- hits：返回的记录组成的数组。返回的记录中，每条记录都有一个_score字段，表示匹配的分数，默认是按照这个字段降序排列。
- _source：具体数据

对了，上面的请求语句等价于：

GET user/admin/_search
{
  "query": {
    "match_all": {}
  }
}

分页查询

在ES中默认一次返回10条结果，可以通过size属性改变这个设置。

GET /my_index/my_type/_search
{
  "size": 20,
  "query": {}
}

然后我们可以通过from属性来指定查询偏移量（默认0）。

GET /my_index/my_type/_search
{
  "size": 2,
  "from": 2,
  "query": {}
}

其实它就跟MySql的 limit offset,size 一个意思

全文搜索

在开始之前，我们需要先准备一些数据。PS：这些是从官网粘过来的

DELETE /my_index 

PUT /my_index
{ "settings": { "number_of_shards": 1 }} 

POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "The quick brown fox" }
{ "index": { "_id": 2 }}
{ "title": "The quick brown fox jumps over the lazy dog" }
{ "index": { "_id": 3 }}
{ "title": "The quick brown fox jumps over the quick dog" }
{ "index": { "_id": 4 }}
{ "title": "Brown fox brown dog" }

match（模糊匹配）

match 查询主要的应用场景就是进行全文搜索。相当于模糊查询的意思

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": "QUICK!"
    }
  }
}

在例子中，我们去查询所有title字段里面包含字符串：QUICK!的所有记录。

match_phrase（短语匹配）

在match中，会把查询内容进行分析之后再进行查询。
例如要查询title字段里面包含字符串：quick dog 的所有记录。match会先把 quick dog 分析成：

quick
dog

然后再去匹配，所以这里就会把所有数据查询出来，因为他们都含有quick和dog。

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": "quick dog"
    }
  }
}

返回结果hits：    

"hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": 0.71083367,
        "_source": {
          "title": "The quick brown fox jumps over the quick dog"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.5774314,
        "_source": {
          "title": "The quick brown fox jumps over the lazy dog"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.42327404,
        "_source": {
          "title": "The quick brown fox"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "4",
        "_score": 0.42327404,
        "_source": {
          "title": "Brown fox brown dog"
        }
      }
    ]

很明显这不符合我们的要求，因为我们此时只想要包含quick dog短语的记录。我们再来看match_phrase

GET /my_index/my_type/_search
{
  "query": {
    "match_phrase": {
      "title": "quick dog"
    }
  }
}
返回结果hits：
"hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": 0.5774314,
        "_source": {
          "title": "The quick brown fox jumps over the quick dog"
        }
      }
    ]

达到效果。类似 match 查询，match_phrase查询首先将查询字符串解析成一个词项列表，然后对这些词项进行搜索，==但只保留那些包含全部搜索词项，且位置与搜索词项相同的文档==。请注意并理解黄色部分。

term（精确匹配）

可以用它处理数字（numbers）、布尔值（Booleans）、日期（dates）以及文本（text）。它类似与SQL中的=。需要注意的是，用term来查询文本字段，该字段一定要未经过分析的。语法如下：

GET /my_index/my_type/_search
{
  "query": {
    "term": {
      "title.keyword": {
        "value": "Brown fox brown dog"
      }
    }
  }
}

title.keyword可以理解为title下面的一个子字段。ES支持对同一个字段添加不同的类型。

bool（组合查询）

它类似于SQL中的WHERE A = 'a' AND B = 'c' OR C = 'c'。一个 bool 过滤器由三部分组成：

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}

must 所有的语句都必须（must）匹配，与 AND 、= 等价。
must_not 所有的语句都不能（must not）匹配，与 NOT 、!= 等价。
should 至少有一个语句要匹配，与 OR 等价。

举个栗子：

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "quick"
          }
        }
      ],
      "must_not": [
        {
          "match_phrase": {
            "title": "brown dog"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "title": "brown fox"
          }
        }
      ]
    }
  }
}

range（范围查询）

我们可以用它来查找处于某个范围内的文档。比如我们在商品中查找价格大于 $20 且小于 $40 美元的。

GET /index/type/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 20,
        "lte": 40
      }
    }
  }
}

range 查询可同时提供包含（inclusive）和不包含（exclusive）这两种范围表达式，可供组合的选项如下：

gt: > 大于（greater than）
lt: < 小于（less than）
gte: >= 大于或等于（greater than or equal to）
lte: <= 小于或等于（less than or equal to）。

range 还可以支持日期范围, 字符串范围类型，特别是在进行日期范围查询时，range还可以支持日期计算。
像这样：

"range" : {
    "timestamp" : {
        "gt" : "now-1h"
    }
}

OR

"range" : {
    "timestamp" : {
        "gt" : "2014-01-01",
        "lt" : "2014-01-01||+1y"
    }
}

注：+1y 表示在前面时间的基础上加上1年

转载于:https://my.oschina.net/tianyuliang/blog/2209002