Elasticsearch简单入门--elasticsearch Exploring Your Data

翻译地址：https://www.elastic.co/guide/en/elasticsearch/reference/5.6/_exploring_your_data.html

感兴趣的可以去官网看英文版文档，学习一手的知识

Sample Dataset 简单的数据集

现在我们已经了解了一些基本知识，让我们尝试使用更真实的数据集，我准备了一个虚构的客户银行账户信息JSON文档示例，每个文档都有以下模式:

{
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "bradshawmckenzie@euron.com",
    "city": "Hobucken",
    "state": "CO"
}

对于好奇的人，该数据是使用www.json-generator.com/生成的，因此，请忽略数据的实际值和语义，因为它们都是随机生成的。

Loading the Sample Dataset 加载简单的数据集

你可以从https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true 下载那个简单的数据集(accounts.json).将其解压缩到当前目录,让我们将其加载到集群中，如下所示:

curl -H "Content-Type: application/json" -XPOST "192.168.101.118:9200/bank/account/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "192.168.101.118:9200/_cat/indices?v"

我本地是使用三台虚拟机搭建的集群，官网上，是使用的localhost,但是我本地会显示拒绝连接，使用ip就可以正常访问

[es@zzf root]$ curl "localhost:9200/_cat/indices?v"
curl: (7) Failed connect to localhost:9200; 拒绝连接

[es@zzf root]$ curl "192.168.101.119:9200/_cat/indices?v"
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana  72rFZhHGQFWc6u5FW993HA   1   1          2            0     13.6kb          6.8kb
green  open   bank     nMgf9iznQx-R1Qh_bwSQiQ   5   1       1000            0      1.2mb          648kb
green  open   customer 7fTSnqrGSaWAuCfiJt3epQ   5   1          2            0     14.8kb          7.4kb

这意味着我们刚刚成功地将1000个document批量索引到bank 索引中（在account 类型下）

1. The Search API

现在让我们以一些简单的搜索开始，运行搜索有两种基本方法，一个是通过REST request RUI发送搜索参数，另一个是通过REST request body发送搜索参数，这个request body方法允许你更有表现力，还可以用更可读的JSON格式定义搜索。

我们将尝试一个request URI方法的示例，但是在本教程的其余部分中，我们将只使用request body方法。

用于搜索的REST API可以从_search端点访问，这个例子将返回在bank索引中的所有documents

GET /bank/_search?q=*&sort=account_number:asc&pretty

让我们首先分析搜索调用,我们正在account索引中搜索(_search endpoint), 并且q=* 参数通知Elasticsearch匹配索引中的所有documents，这个sort=account_number:asc参数表明使用每个document中account_number字段升序排序搜索结果，

同样，这个pretty的参数只是告诉Elasticsearch返回漂亮打印（pretty-printed）的JSON结果。

响应(部分显示):

{
  "took" : 63,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : null,
    "hits" : [ {
      "_index" : "bank",
      "_type" : "account",
      "_id" : "0",
      "sort": [0],
      "_score" : null,
      "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
    }, {
      "_index" : "bank",
      "_type" : "account",
      "_id" : "1",
      "sort": [1],
      "_score" : null,
      "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
    }, ...
    ]
  }
}

关于回应，我们看到以下各部分:

took - Elasticsearch执行搜索的时间(毫秒) (time in milliseconds for Elasticsearch to execute the search)
timed_out - 告诉我们搜索是否超时（ tells us if the search timed out or not）
_shards - 告诉我们搜索了多少个shards，以及成功/失败搜索的shards的数量（tells us how many shards were searched, as well as a count of the successful/failed searched shards）
hits - 搜索结果（searches results）
hit.total - 符合搜索条件的文档总数（total number of documents matching our search criteria）
hit.hits - 搜索结果的实际数组 (默认是前10个documents ) actual array of search results (defaults to first 10 documents)
hit.sort - 结果的排序键（如果通过score排序就可以缺省）
hits._score and max_score - 现在先忽略这些字段

下面是上面使用另一种 request body方法进行的完全相同的搜索:

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

这里的不同之处在于，我们没有在URI中传递q=*，而是将json风格的查询请求体提供给_search API.

重要的是要理解，一旦您得到您的搜索结果回来，Elasticsearch是完全完成的请求，不维护任何类型的服务器端资源或打开的游标到您的结果。这是许多其他平台如SQL形成鲜明对比,你最初可能得到部分的子集查询结果预先然后不断返回到服务器,如果你想获取(或页面)其余的结果使用某种状态的服务器端游标。

2. Introducing the Query Language

Elasticsearch提供了一种json风格的domain-specified language，您可以使用它来执行查询,这称为查询DSL。

查询语言非常全面，乍一看可能令人生畏，但是真正学习它的最佳方法是从几个基本示例开始。

回到上一个例子，我们执行这个查询:

GET /bank/_search
{
  "query": { "match_all": {} }
}

仔细分析上面的语句，这个query部分告诉我们查询定义是什么，match_all部分只是我们希望运行的查询类型，这个match_all查询只是搜索指定索引下的所有documents。

除了query参数之外，我们也能传递其他参数来影响搜索结果，在上面的例子中，我们传递了sort，这里我们传递size

GET /bank/_search
{
  "query": { "match_all": {} },
  "size": 1
}

注意这个如果没有指定size，则默认为10。

这个例子执行match_all并返回文档10到19:

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}

这个from参数（基于0）指定document index从什么开始，和这size 参数指定从from参数开始返回多少文档，这个特性在实现搜索结果分页时非常有用。注意如果没有指定from，默认值为0，

这个示例执行match_all并按帐户余额降序排列结果，并返回前10个(默认大小)文档。（This example does a match_all and sorts the results by account balance in descending order and returns the top 10 (default size) documents.）

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}