elasticsearch6.x官方文档学习笔记----Getting Started

本文链接：https://blog.csdn.net/qq_33872191/article/details/82745464

Getting Started

本文主要介绍了如下操作：

1）基本概念。

2）索引的基本操作。

3）查询操作，filter操作，聚集操作。

基本概念

1）准实时：ES搜索是一个接近实时的搜索平台。这意味着从您索引一个文档的时间到它可搜索的时间，有一个轻微的延迟（通常是一秒）。

2）集群：ES是一个集群，一个集群由一个惟一的名称标识id，默认情况下是“elasticsearch。

3）节点：节点是一个单独的服务器，它是集群的一部分，存储您的数据，并参与集群的索引和搜索功能，同过集群id加入一个确切的集群。

4）index：索引是具有类似特征的文档集合。

5）type ：一种用于索引的逻辑分类/分区的类型，允许您在同一个索引中存储不同类型的文档，为用户提供的一种类型，另一种用于博客的类型。

6）document：文档是可以被索引的基本信息单元。例如，您可以为单个客户提供一个文档，一个单一产品的另一个文档，另一个用于单个订单。该文档以JSON（JavaScript对象表示法）表示，这是一种无处不在的网络数据交换格式。

7）Shards&Replicas：索引会有分片存放在不同节点上，每个分片都有副本，而且分片可以在创建索引时指定分片，和副本的数量。

安装

跳过

可以安装elasticsearch-head通过web查看ES的信息。跳过

操作ES

1）查看集群状态：

curl -XGET 'http://mini1:9200/_cat/health?v

结果：

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1475247709 17:01:49  elasticsearch green           1         1      0   0    0    0        0             0                  -                100.0%

status结果可以分为三种：

绿色-一切都很好（集群功能齐全）

黄色——所有的数据都是可用的，但是有些副本还没有被分配（集群是完全功能的）

红色——有些数据由于某种原因无法使用（集群部分功能）

2）：查看所有索引

curl -XGET 'http://mini1:9200/_cat/indices?v'

结果：

health status index                pri rep docs.count docs.deleted store.size pri.store.size 
green  open   store                  5   1          3            0     23.4kb         11.7kb 
green  open   .kibana                1   1          2            1     25.2kb         12.6kb 
green  open   gorktest1-2018.09.16   5   1        405            0      908kb          454kb

3）创建索引：

curl -XPUT 'http://mini1:9200/store'

4）插入数据：

http://localhost:9200/<index>/<type>/[<id>]
其中index、type是必须提供的。
id是可选的，不提供es会自动生成。
index、type将信息进行分层，利于管理。
index可以理解为数据库；type理解为数据表；id相当于数据库表中记录的主键，是唯一的。

#向store索引中添加一些书籍

curl -XPUT 'http://mini1:9200/store/books/1' -d '{
  "title": "Elasticsearch: The Definitive Guide",
  "name" : {
    "first" : "Zachary",
    "last" : "Tong"
  },
  "publish_date":"2015-02-06",
  "price":"49.99"
}'

5）查询：

curl -XGET 'http://mini1:9200/store/books/1?pretty'

返回结果：

{
  "_index" : "store",
  "_type" : "books",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "title" : "Elasticsearch: The Definitive Guide",
    "name" : {
      "first" : "Zachary",
      "last" : "Tong"
    },
    "publish_date" : "2015-02-06",
    "price" : "49.99"
  }
}

6）删除：

curl -XDELETE 'http://mini1:9200/store?pretty'

7）修改：

curl -XPOST 'http://mini1:9200/store/books/1/_update?pretty' -d '{
  "doc": {
     "price" : 88.88
  }
}'

8）批量操作：

除了能够索引、更新和删除单个文档之外，Elasticsearch还可以使用_bulk API批量执行上述任何操作。这个功能非常重要，因为它提供了一种非常有效的机制，可以在尽可能少的网络往返的情况下尽可能快地执行多个操作。

#再添加一本书
curl -XPUT 'http://mini1:9200/store/books/2' -d '{
  "title": "Elasticsearch Blueprints",
  "name" : {
    "first" : "Vineeth",
    "last" : "Mohan"
  },
  "publish_date":"2015-06-06",
  "price":"35.99"
}'

批量更新：

curl -XPOST 'http://mini1:9200/store/books/_bulk' -d '
	{"index":{"_id":"1"}}
	{"price": 10.99 }
	{"index":{"_id":"2"}}
	{"price": "111.22" }
'

查询操作

URL方式：

curl -XGET 'http://mini1:9200/store/books/_search?q=*&sort=price:asc&pretty'

解析：

q=*参数指示Elasticsearch匹配索引中的所有文档。sort=price:asc参数指示使用每个文档的price字段按升序对结果进行排序。pretty的参数只是告诉Elasticsearch返回漂亮的JSON结果。

返回结果：

{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [ {
      "_index" : "store",
      "_type" : "books",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "price" : 10.99
      },
      "sort" : [ "10.99" ]
    }, {
      "_index" : "store",
      "_type" : "books",
      "_id" : "2",
      "_score" : null,
      "_source" : {
        "price" : "111.22"
      },
      "sort" : [ "111.22" ]
    } ]
  }
}

解析：

took – 对Elasticsearch执行搜索的时间以毫秒为单位
timed_out – 告诉我们搜索是否超时
_shards – 告诉我们搜索了多少碎片，以及成功/失败搜索碎片的计数
hits – 搜索结果
hits.total – 符合我们搜寻条件的文件总数
hits.hits – 实际的搜索结果数组(默认为前10个文档)
hits.sort - 结果的排序键(如果按分数排序，则会丢失)
hits._score and max_score - ignore these fields for now

JSON方式：

curl -XGET 'http://mini1:9200/store/books/_search?pretty' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "price": "asc" }
  ]
}'

解析：

仔细分析上面的内容，query部分告诉我们查询定义是什么，match_all部分只是我们想要运行的查询类型。match_all查询只是在指定索引中搜索所有文档，sort按照哪些字段进行进行何种（desc,asc）排序，。

curl -X GET "localhost:9200/store/_search"  -d'
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}
'

解析：

from参数(基于0)指定从哪个文档索引开始，size参数指定从from参数开始返回多少文档。这个特性在实现搜索结果分页时非常有用。注意，如果未指定from，则默认为0。

执行查询：

既然我们已经了解了一些基本的搜索参数，那么让我们进一步研究查询DSL。让我们首先看一下返回的文档字段。默认情况下，完整的JSON文档作为所有搜索的一部分返回。这称为源(搜索命中的_source字段)。如果我们不希望返回整个源文档，我们可以从源文档中只请求几个字段来返回。

curl -X GET "http://mini1:9200/store/_search?pretty"  -d'
{
  "query": { "match_all": {} },
  "_source": ["name.first", "price"]
}
'

理解成sql语句为：

select first,price from books

match query：

curl -X GET "http://mini1:9200/store/_search?pretty"  -d'
{
  "query": { "match": { "price": "111.22" } }
}'

解析：

match query的新查询，它可以被看作是基本的字段搜索查询(即针对特定字段或字段集进行的搜索)。模糊查询。

理解成sql语句为：

select price form books where price like %"111.22"%

下面直接扒官网的了：

这个示例返回地址中包含“mill”或“lane”的所有帐户:

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "query": { "match": { "address": "mill lane" } }
}
'

这个示例是match (match_phrase)的变体，它返回地址中包含短语“mill lane”的所有帐户:

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
'

bool查询：

这个例子包含两个匹配查询，并返回地址中包含“mill”和“lane”的所有帐户:

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'

在上面的示例中，bool must子句指定所有查询，这些查询必须为true，才能将文档视为匹配。must相当于逻辑与操作&。

与此相反，这个示例包含两个匹配查询，并返回地址中包含“mill”或“lane”的所有帐户:

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'

这个例子组合了两个匹配查询，并返回地址中既不包含“mill”也不包含“lane”的所有帐户:

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'

This example returns all accounts of anybody who is 40 years old but doesn’t live in ID(aho):

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'

filter操作：

filter主要是过滤掉很多数据，减少查询时间。提高效率。

我们在前一节中介绍的bool查询还支持filter子句，该子句允许使用查询来限制将由其他子句匹配的文档，而不改变计算分数的方式。作为一个例子，让我们引入range查询，它允许我们通过一系列值筛选文档。这通常用于数字或日期过滤。

这个示例使用bool查询返回所有余额在20000到30000之间的帐户(包括在内)。换句话说，我们希望找到的账户余额大于等于20000，小于等于30000。

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
'

sql语句如下：

select * from bank where balance>=20000 and balance<=30000

聚合操作：

这个示例按状态对所有帐户进行分组，然后返回按计数递减排序的前10个(默认)状态:

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
'

sql：

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;

返回结果（部分）：

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

注意，我们将size=0设置为不显示搜索结果，因为我们只想看到响应中的聚合结果。

在前面的聚合的基础上，本示例通过stat计算平均帐户余额(同样，仅针对按计数降序排序的前10个状态):

curl -X GET "localhost:9200/bank/_search"  -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

注意我们如何将average_balance聚合嵌套到group_by_state聚合中。这是所有聚合的常见模式。

Building on the previous aggregation, let’s now sort on the average balance in descending order:

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

这个例子展示了我们如何根据年龄等级(20-29岁，30-39岁，40-49岁)来分组，然后根据性别，最后得到平均账户余额，每个年龄等级，每个性别:

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}
'

ok.....Getting Started 完毕，对ES有个大体的认识，明天继续更新，文档，还要坐一个半小时的车回家，难受。