Elasticsearch 2.20 文档篇：查询文档

最新推荐文章于 2024-08-15 14:09:45 发布

weixin_33924220

最新推荐文章于 2024-08-15 14:09:45 发布

阅读量113

点赞数

文章标签：大数据运维 python

原文链接：https://my.oschina.net/secisland/blog/614467

版权

2019独角兽企业重金招聘Python工程师标准>>>

Elasticsearch查询文档API准许用户通过文档的ID来查询具体的某一个文档，例如下面查询索引为secilog，type为log，id为1的文档：

请求：GET http://localhost:9200/secilog/log/1?pretty

返回的内容为：

{
  "_index" : "secilog",
  "_type" : "log",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "collect_type" : "syslog",
    "collect_date" : "2016-01-11T09:32:12",
    "message" : "Failed password for root from 192.168.21.2 port 50790 ssh2"
  }
}

从返回的内容中可以有很多有用的信息，当然也包括原始的文档信息，放在_source字段中。

默认情况下，查询获得的数据接口是实时的，并且不受索引的刷新率影响，为了禁用实时性，可以将参数realtime设置为false，或全局设置action.get.realtime为false。

备注：在查询中，中间的_type是可选的，当不指定具体type的时候，可以用_all来代替。

默认情况下，查询操作会返回_source字段，当然_source可以被禁用。例如：

请求：GET http://localhost:9200/secilog/log/1?_source=false&pretty

返回的内容为：

{
  "_index" : "secilog",
  "_type" : "log",
  "_id" : "1",
  "_version" : 2,
  "found" : true
}

当然如果你想获取sorce中的一部分内容，可以用_source_include或者_source_exclude来包含或者过滤其中的某些字段，例如：

请求：GET http://localhost:9200/secilog/log/1?_source_include=message&pretty

返回的内容为：

{
  "_index" : "secilog",
  "_type" : "log",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "message" : "Failed password for root from 192.168.21.2 port 50790 ssh2"
  }
}

当一个文档内容非常多的时候，用包含或者过滤可以减少很多的网络负担。如果有多个，可以用都好分开，或者用*通配符。例如：

请求：GET http://localhost:9200/secilog/log/1?_source_include=message,collect_date&pretty

返回的内容为：

{
  "_index" : "secilog",
  "_type" : "log",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "message" : "Failed password for root from 192.168.21.2 port 50790 ssh2",
    "collect_date" : "2016-01-11T09:32:12"
  }
}

通过fields字段过滤，可以从存储中查询一组字段，例如：

请求：GET http://localhost:9200/secilog/log/1?fields=message,collect_date&pretty

返回的内容：

{
  "_index" : "secilog",
  "_type" : "log",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "fields" : {
    "message" : [ "Failed password for root from 192.168.21.2 port 50790 ssh2" ],
    "collect_date" : [ "2016-01-11T09:32:12" ]
  }
}

备注：从返回值可以看出，返回的字段是数组类型的，但_routing字段和_parent字段是没有数组返回的。只有叶子字段可以从fields中进行查询，对象数据是不生效的。

如果建立索引后还没有来得及刷新，查询得到的内容是事务的日志。但有些字段只有在索引的时候才会产生，当访问这些字段的时候，系统会抛出一个异常。可以通过设置ignore_errors_on_generated_fields=true.来忽略这些字段。

只获取文档内容

可以用过 /{index}/{type}/{id}/_source的方式只获取文档内容，例如：

请求：GET http://localhost:9200/secilog/log/1/_source?pretty

返回的内容：

{
  "collect_type" : "syslog",
  "collect_date" : "2016-01-11T09:32:12",
  "message" : "Failed password for root from 192.168.21.2 port 50790 ssh2"
}

同理，这种方式的查询也可以通过之前的过滤方式来选择具体的字段。

分片选择(routing)

可以在查询的时候指定路由选择(routing),当路由不存在的时候，返回为空值，此实例是在事先做了路由的操作，例如：

请求：GET http://localhost:9200/secilog/log/1?routing=secisland&pretty

返回的内容：

{
  "_index" : "secilog",
  "_type" : "log",
  "_id" : "1",
  "_version" : 1,
  "_routing" : "secisland",
  "found" : true,
  "_source" : {
    "collect_type" : "syslog",
    "collect_date" : "2016-01-11T09:32:12",
    "message" : "Failed password for root from 192.168.21.2 port 50790 ssh2"
  }
}

通过参数控制，查询的时候可以指定查询是在主节点上查询还是在副本节点上查询。

_primary：在主节点进行查询；

_local：尽可能在本地节点上进行查询；

刷新参数：

refresh参数可以被设置为true，使之在搜索操作前刷新相关的分片保证可以及时查询到。但这个参数会消耗系统的资源，除非有必要，正常情况下不需要设置。

赛克蓝德(secisland)后续会逐步对Elasticsearch的最新版本的各项功能进行分析，近请期待。也欢迎加入secisland公众号进行关注。

转载于:https://my.oschina.net/secisland/blog/614467