适用于Java开发人员的Elasticsearch：命令行中的Elasticsearch

最新推荐文章于 2023-08-31 11:41:36 发布

dnc8371

最新推荐文章于 2023-08-31 11:41:36 发布

阅读量273

点赞数

文章标签： python elasticsearch java 大数据人工智能

本文是我们学院课程的一部分，该课程的标题为Java开发人员的Elasticsearch教程。

在本课程中，我们提供了一系列教程，以便您可以开发自己的基于Elasticsearch的应用程序。我们涵盖了从安装和操作到Java API集成和报告的广泛主题。通过我们简单易懂的教程，您将能够在最短的时间内启动并运行自己的项目。在这里查看！

1.简介

通过本教程的前一部分，我们对Elasticsearch是什么，它的基本概念以及它可以带给我们应用程序的搜索功能的功能有了很好的了解。在本节中，我们将直接进入战斗，并在实践中运用我们的知识。在本节中， curl和/或http将是我们将用来与Elasticsearch交朋友的唯一工具。

正如我们将要看到的，与独立实例相比，使用Elasticsearch集群有很多微妙之处，最好准备好应对它们。希望您仍然记得本教程的上一部分，如何启动Elasticsearch，因为这将是唯一的先决条件：启动并运行集群。这样，让我们开始吧！

2.我的集群健康吗？

在对Elasticsearch集群进行任何处理之前，您需要了解的第一件事是其运行状况。有两种收集这些信息的方法，但是可以说，最简单，最方便的方法是使用群集API ，尤其是群集运行状况端点。

$ http http://localhost:9200/_cluster/health

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "active_primary_shards": 0,
    "active_shards": 0,
    "active_shards_percent_as_number": 100.0,
    "cluster_name": "es-catalog",
    "delayed_unassigned_shards": 0,
    "initializing_shards": 0,
    "number_of_data_nodes": 3,
    "number_of_in_flight_fetch": 0,
    "number_of_nodes": 3,
    "number_of_pending_tasks": 0,
    "relocating_shards": 0,
    "status": "green",
    "task_max_waiting_in_queue_millis": 0,
    "timed_out": false,
    "unassigned_shards": 0
}

在这些细节中，我们正在寻找应该设置为green status指示器，这意味着所有分片均已分配并且群集处于良好的运行状态。

3.关于指数的一切

我们的Elasticsearch集群全是绿色的，可以摇摆不定。下一步的逻辑步骤是创建一个catalog索引，其中包含我们之前概述的映射类型和设置。但是在此之前，让我们检查这次是否已经使用Indices API 创建了任何索引。

$ http http://localhost:9200/_stats

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "_all": {
        "primaries": {},
        "total": {}
    },
    "_shards": {
        "failed": 0,
        "successful": 0,
        "total": 0
    },
    "indices": {}
}

正如预期的那样，我们的集群尚无任何内容，因此我们很乐意为图书目录创建索引。众所周知， Elasticsearch说的是JSON，但要说的是从命令行使用或多或少复杂的JSON文档，这有点麻烦。让我们更好地将catalog设置和映射存储在catalog-index.json文档中。

{ 
  "settings": {
    "index" : {
      "number_of_shards" : 5, 
      "number_of_replicas" : 2 
    }
  },
  "mappings": {
    "books": {
      "_source" : {
        "enabled": true
      },
      "properties": {
        "title": { "type": "text" },
        "categories" : {
          "type": "nested",
          "properties" : {
            "name": { "type": "text" }
          }
        },
        "publisher": { "type": "keyword" },
        "description": { "type": "text" },
        "published_date": { "type": "date" },
        "isbn": { "type": "keyword" },
        "rating": { "type": "byte" }
       }
   },
   "authors": {
     "properties": {
       "first_name": { "type": "keyword" },
       "last_name": { "type": "keyword" }
     },
     "_parent": {
        "type": "books"
      }
    }
  }
}

并将此文档用作创建索引API的输入。

$ http PUT http://localhost:9200/catalog < catalog-index.json

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "acknowledged": true,
    "shards_acknowledged": true
}

对于大多数Elasticsearch API（尤其是应用突变的API）中acknowledged响应属性的用法，应该说几句话。通常，此值仅表示操作是在超时之前完成（ “true” ）还是可能在不久的将来生效（ “false” ）。稍后，我们将在不同的上下文中看到其用法的更多示例。

就是这样，我们已经使catalog索引生效。为了确保这一事实的真实性，我们可以要求Elasticsearch返回catalog 索引设置。

$ http http://localhost:9200/catalog/_settings

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "catalog": {
        "settings": {
            "index": {
                "creation_date": "1487428863824",
                "number_of_replicas": "2",
                "number_of_shards": "5",
                "provided_name": "catalog",
                "uuid": "-b63dCesROC5UawbHz8IYw",
                "version": {
                    "created": "5020099"
                }
            }
        }
    }
}

太好了，正是我们订购的东西。您可能想知道，如果我们尝试通过增加分片的数量来更新索引设置， Elasticsearch会如何反应（众所周知，创建索引后，并非所有索引设置都可以更新）。

$ echo '{"index":{"number_of_shards":6}}' | http PUT http://localhost:9200/catalog/_settings

HTTP/1.1 400 Bad Request
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "error": {
        "reason": "can't change the number of shards for an index",
        "root_cause": [
            ...
        ],
        "type": "illegal_argument_exception"
    },
    "status": 400
}

错误响应不足为奇（请注意，已减少响应详细信息仅出于说明目的）。连同设置，很容易获得特定索引的映射类型，例如：

$ http http://192.168.99.100:9200/catalog/_mapping

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "catalog": {
        "mappings": {
            "authors": {
                ...
            },
            "books": {
                ...
            }
        }
    }
}

总体而言，现有字段的索引映射无法更新；但是该规则有一些例外。索引API的最大功能之一是能够针对特定的索引映射类型和字段执行分析过程，而无需实际发送任何文档。

$ http http://localhost:9200/catalog/_analyze field=books.title text="Elasticsearch: The Definitive Guide. A Distributed Real-Time Search and Analytics Engine"

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "tokens": [
        {
            "end_offset": 13,
            "position": 0,
            "start_offset": 0,
            "token": "elasticsearch",
            "type": ""
        },
        {
            "end_offset": 18,
            "position": 1,
            "start_offset": 15,
            "token": "the",
            "type": ""
        },
        
        ...

        {
            "end_offset": 88,
            "position": 11,
            "start_offset": 82,
            "token": "engine",
            "type": ""
        }
    ]
}

万一您想在将大量数据投入Elasticsearch进行索引之前验证映射类型的参数，此功能特别有用。

最后但并非最不重要的一点是，有关索引状态的一个重要细节。任何特定的索引都可以处于opened （完全可操作）或closed （阻塞以进行读/写操作，已归档将是一个很好的类比）状态。至于其他所有内容， Elasticsearch 为此提供了一个API 。

$ http POST http://localhost:9200/catalog/_open

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "acknowledged": true
}

4.文件，更多文件，…

没有文档的空索引不是很有用，因此让我们将索引API的齿轮切换到另一个出色的文档API上。我们将开始使用最简单的单个文档操作来探索它，它依赖于以下book.json文档：

{
  "title": "Elasticsearch: The Definitive Guide. A Distributed Real-Time Search and Analytics Engine",
  "categories": [
      { "name": "analytics" },
      { "name": "search" },
      { "name": "database store" }
  ],
  "publisher": "O'Reilly",
  "description": "Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships.", 
  "published_date": "2015-02-07",
  "isbn": "978-1449358549",
  "rating": 4
}

在将此JSON发送到Elasticsearch之前，最好先讨论一下文档标识。 Elasticsearch中的每个文档都有一个唯一的标识符，该标识符存储在特殊的_id字段中。您可以在将文档上传到Elasticsearch时提供一个（就像我们在下面的示例中使用isbn一样，因为它是自然标识符的一个很好的例子），否则它会由Elasticsearch生成并分配。

$ http PUT http://localhost:9200/catalog/books/978-1449358549 < book.json

HTTP/1.1 201 Created
Location: /catalog/books/978-1449358549
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "_id": "978-1449358549",
    "_index": "catalog",
    "_shards": {
        "failed": 0,
        "successful": 3,
        "total": 3
    },
    "_type": "books",
    "_version": 1,
    "created": true,
    "result": "created"
}

我们的第一个文档进入了books类型下的catalog索引。但是，我们也有authors类型，与books之间存在父/子关系。让我们用authors.json文档中的作者来补充这本书。

[
  {
    "first_name": "Clinton",
    "last_name": "Gormley",
    "_parent": "978-1449358549"
  },
  {
    "first_name": "Zachary",
    "last_name": "Tong",
    "_parent": "978-1449358549"
  }
]

这本书有一位以上的作者，因此我们仍然可以通过逐个索引每个作者文档来使用单个文档API 。但是，让我们不要这样做，而是切换到批量文档API，然后将我们的authors.json文档转换为与批量文档API格式兼容。

{ "index" : { "_index" : "catalog", "_type" : "authors", "_id": "1", "_parent": "978-1449358549" } }
{ "first_name": "Clinton", "last_name": "Gormley" }
{ "index" : { "_index" : "catalog", "_type" : "authors", "_id": "2", "_parent": "978-1449358549" } }
{ "first_name": "Zachary", "last_name": "Tong" }

完成后，让我们将该文档另存为authors-bulk.json，并将其直接输入到批量文档API端点中。

$ http POST http://localhost:9200/_bulk < authors-bulk.json

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "errors": false,
    "items": [
        {
            "index": {
                "_id": "1",
                "_index": "catalog",
                "_shards": {
                    "failed": 0,
                    "successful": 3,
                    "total": 3
                },
                "_type": "authors",
                "_version": 5,
                "created": false,
                "result": "updated",
                "status": 200
            }
        },
        {
            "index": {
                "_id": "2",
                "_index": "catalog",
                "_shards": {
                    "failed": 0,
                    "successful": 3,
                    "total": 3
                },
                "_type": "authors",
                "_version": 2,
                "created": true,
                "result": "created",
                "status": 201
            }
        }
    ],
    "took": 105
}

而且，我们拥有书籍和作者文档，是catalog索引的第一批公民！现在是时候取回这些文件了。

$ http http://localhost:9200/catalog/books/978-1449358549

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "_id": "978-1449358549",
    "_index": "catalog",
    "_source": {
        "categories": [
            { "name": "analytics" },
            { "name": "search"},
            { "name": "database store" }
        ],
        "description": "...",
        "isbn": "978-1449358549",
        "published_date": "2015-02-07",
        "publisher": "O'Reilly",
        "rating": 4,
        "title": "Elasticsearch: The Definitive Guide. A Distributed Real-Time Search and Analytics Engine"
    },
    "_type": "books",
    "_version": 1,
    "found": true
}

简单！但是，要从authors集合中获取文档（它们是books集合中各自文档的子代），我们必须提供父标识符以及该文档自己的标识符，例如：

$ http http://localhost:9200/catalog/authors/1?parent=978-1449358549

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "_id": "1",
    "_index": "catalog",
    "_parent": "978-1449358549",
    "_routing": "978-1449358549",
    "_source": {
        "first_name": "Clinton",
        "last_name": "Gormley"
    },
    "_type": "authors",
    "_version": 1,
    "found": true
}

这是在Elasticsearch中处理父子关系的细节之一。正如已经提到的那样，您可以以更简单的方式对这种关系进行建模，但是如果您选择在应用程序中采用这种方式，我们的目标是学习如何处理这种关系。

删除和更新 API非常简单，因此我们只介绍它们，请注意，适用于标识子文档的相同规则。您可能会感到惊讶，但是删除父文档并不会自动删除其子文档，因此请记住这一点。稍后我们将看到如何解决该问题。

最后，让我们看一下术语vectors API ，例如，该函数返回有关文档字段中术语的所有详细信息和统计信息（仅粘贴了响应的一小部分）：

$ http http://localhost:9200/catalog/books/978-1449358549/_termvectors?fields=description

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "_id": "978-1449358549",
    "_index": "catalog",
    "_type": "books",
    "_version": 1,
    "found": true,
    "term_vectors": {
        "description": {
            "field_statistics": {
                "doc_count": 1,
                "sum_doc_freq": 46,
                "sum_ttf": 60
            },
            "terms": {
                "analyze": {
                    "term_freq": 1,
                    "tokens": [ ... ]
                },
                "and": {
                    "term_freq": 2,
                    "tokens": [ ... ]

                },
                "complexities": {
                    "term_freq": 1,
                    "tokens": [ ... ]

                },
                "data": {
                    "term_freq": 3,
                    "tokens": [ ... ]

                },
                ...
            }
        }
    },
    "took": 5
}

您可能不会经常使用术语vectors API，但是它是解决某些文档为何未在搜索结果中弹出的绝佳工具。

5.如果我的映射类型不理想怎么办

随着时间的流逝，您可能经常发现映射类型可能不是最佳的，而是可能会变得更好。但是， Elasticsearch仅支持对现有映射类型的有限修改。幸运的是， Elasticsearch提供了专用的重新索引API ，例如：

$ echo '{"source": {"index": "catalog"}, "dest": {"index": "catalog-v2"}}' | http POST http://localhost:9200/_reindex

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "batches": 0,
    "created": 200,
    "deleted": 0,
    "failures": [],
    "noops": 0,
    "requests_per_second": -1.0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "throttled_until_millis": 0,
    "timed_out": false,
    "took": 265,
    "total": 200,
    "updated": 0,
    "version_conflicts": 0
}

这里的窍门是创建一个具有更新的映射类型catalog-v2的新索引，而不是仅让Elasticsearch从旧索引（ catalog ）中获取所有文档，然后将它们放入新索引（ catalog-v2 ），最后交换索引。请注意，它不仅适用于本地索引，还适用于远程索引。

尽管很简单，但该API仍被认为是实验性的，可能并不适合所有情况，例如，如果您的索引确实很大，或者您的Elasticsearch负载很大，并且应该优先处理应用程序请求。

6.搜索时间

我们已经学习了如何创建索引，映射类型和为文档建立索引，这些都是重要但并非真正令人兴奋的主题。但是搜索绝对是Elasticsearch的心脏和灵魂，因此让我们立即了解它。

为了演示不同的搜索功能，我们将需要更多文档，请使用我们的好友批量文档API将它们从books-and-authors-bulk.json上传到您的Elasticsearch集群中。

$ http POST http://localhost:9200/_bulk < books-and-authors-bulk.json

我们的收藏集中有一些文档，我们可以开始使用最易访问的搜索API形式对它们发出搜索查询，该搜索API通过查询字符串接受URI中的搜索条件。例如，让我们搜索术语engine （记住search engine短语）。

$ http POST http://localhost:9200/catalog/books/_search?q=engine

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "978-1449358549",
                "_index": "catalog",
                "_score": 0.7503276,
                "_source": {
                    "categories": [
                        { "name": "analytics },
                        { "name": "search" },
                        { "name": "database store" }
                    ],
                    "description": " Whether you need full-text search or real-time ...",
                    "isbn": "978-1449358549",
                    "published_date": "2015-02-07",
                    "publisher": "O'Reilly",
                    "rating": 4,
                    "title": " Elasticsearch: The Definitive Guide. ..."
                },
                "_type": "books"
            }
        ],
        "max_score": 0.7503276,
        "total": 1
    },
    "timed_out": false,
    "took": 22
}

确实，这是一个很好的起点，该API对于执行快速和浅层搜索非常有用，但是其功能非常有限。使用请求正文API进行搜索是一种完全不同的野兽，它揭示了Elasticsearch的全部功能。它建立在基于JSON的查询DSL之上，后者是简洁而直观的语言，可构造任意复杂的搜索查询。

Query DSL允许描述很多查询类型，每种都有自己的语法和参数。但是，有一组通用参数，例如sort ， from ， size ， stored_fields （实际上该列表确实很长），它们与查询类型无关，并且可以应用于任何这些参数。

在接下来的几节中，我们将从http切换到curl，因为在处理JSON负载时后者更加方便。

我们将使用Query DSL尝试的第一种查询类型是match all查询。在某种程度上，它并不是真正的查询，因为它只匹配所有文档。因此，它可能会返回很多结果，通常，请始终以合理的大小限制为查询添加注释，下面是一个示例：

$ curl –i http://localhost:9200/catalog/books/_search?pretty -d '                                                                                                                                  
{
    "size": 10,
    "query": {
        "match_all" : {
        }
    }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 3112
{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1449358549",
        "_score" : 1.0,
        "_source" : {
          "title" : "Elasticsearch: The Definitive Guide ...",
          "categories" : [
            { "name" : "analytics" },
            { "name" : "search" },
            { "name" : "database store" }
          ],
          "publisher" : "O'Reilly",
          "description" : "Whether you need full-text ...",
          "published_date" : "2015-02-07",
          "isbn" : "978-1449358549",
          "rating" : 4
        }
      },
      ...
    ]
  }
}

下一个是真实查询类型，称为一类全文查询，它针对全文文档字段（可能是使用最广泛的字段）进行搜索。它以基本形式针对单个文档字段进行匹配，例如书的description 。

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
    "query": {
        "match" : {
            "description" : "engine"
        }
    }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 1271
{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.28004453,
    "hits" : [
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1449358549",
        "_score" : 0.28004453,
        "_source" : {
          "title" : "Elasticsearch: The Definitive Guide. ...",
          "categories" : [
            { "name" : "analytics" },
            { "name" : "search" },
            { "name" : "database store" }
          ],
          "publisher" : "O'Reilly",
          "description" : "Whether you need full-text ...",
          "published_date" : "2015-02-07",
          "isbn" : "978-1449358549",
          "rating" : 4
        }
      }
    ]
  }
}

但是，全文查询是非常强大的，有不少其他的变化，包括match_phrase ， match_phrase_prefix ， multi_match ， common_terms ， QUERY_STRING和simple_query_string 。

继续前进，我们进入了术语级查询的世界，这些术语按确切的术语进行操作，通常用于数字，日期和关键字等字段类型。 publisher图书领域是尝试的不错选择。

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
   "size": 10,
   "_source": [ "title" ],
   "query": {
        "term" : {
            "publisher" : "Manning"
        }
    }
}'  

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 675

{
  "took" : 21,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.18232156,
    "hits" : [
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1617291623",
        "_score" : 0.18232156,
        "_source" : {
          "title" : "Elasticsearch in Action"
        }
      },
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1617292774",
        "_score" : 0.18232156,
        "_source" : {
          "title" : "Relevant Search: With applications ..."
        }
      }
    ]
  }
}

请注意，我们如何限制文档_source的属性以仅返回title字段。术语级别查询的其他变体包括术语，范围，存在，前缀，通配符，正则表达式，模糊，类型和ID 。

在我们的书catalog索引中，连接查询是非常有趣的查询。这些查询允许对具有父/子关系的嵌套对象或文档执行搜索。例如，让我们找出analytics类别中的所有书籍。

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
   "size": 10,
   "_source": [ "title", "categories" ],
   "query": {
        "nested": {
            "path": "categories",
            "query" : {
                "match": {
                    "categories.name" : "analytics"
                }
            }
       }
    }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 1177

{
  "took" : 45,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.3112576,
    "hits" : [
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1617291623",
        "_score" : 1.3112576,
        "_source" : {
          "categories" : [
            { "name" : "analytics" },
            { "name" : "search" },
            { "name" : "database store" }
          ],
          "title" : "Elasticsearch in Action"
        }
      },
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1449358549",
        "_score" : 1.0925692,
        "_source" : {
          "categories" : [
            { "name" : "analytics" },
            { "name" : "search" },
            { "name" : "database store" }
          ],
          "title" : "Elasticsearch: The Definitive Guide ..."
        }
      }
    ]
  }
}

同样，我们可以搜索克林顿·戈姆利（ Clinton Gormley）创作的所有书籍，从而利用books和authors集合之间的父子关系。

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
   "size": 10,
   "_source": [ "title" ],
   "query": {
       "has_child" : {
            "type" : "authors",
            "inner_hits" : {
                "size": 5
            },
            "query" : {
                "term" : {
                    "last_name" : "Gormley"
                }
            }
        }
    }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 1084

{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1449358549",
        "_score" : 1.0,
        "_source" : {
          "title" : "Elasticsearch: The Definitive Guide ..."
        },
        "inner_hits" : {
          "authors" : {
            "hits" : {
              "total" : 1,
              "max_score" : 0.6931472,
              "hits" : [
                {
                  "_type" : "authors",
                  "_id" : "1",
                  "_score" : 0.6931472,
                  "_routing" : "978-1449358549",
                  "_parent" : "978-1449358549",
                  "_source" : {
                    "first_name" : "Clinton",
                    "last_name" : "Gormley"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

请注意inner_hits查询参数的存在，该参数使搜索结果包括与联接条件匹配的内部文档。

其他查询类型（例如地理查询，专用查询和跨度查询）的工作方式非常相似，因此我们将跳过它们并通过研究复合查询来完成。到目前为止，我们看到的示例仅包含具有一种搜索条件的查询，但是Query DSL也具有构造复合查询的方式。让我们看一下使用布尔查询的示例，它是我们已经看到的一些查询类型的组成。

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
   "size": 10,
   "_source": [ "title", "publisher" ],
   "query": {
       "bool" : {
          "must" : [
              {
                  "range" : {
                      "rating" : { "gte" : 4 }
                  }
              },
              {
                  "has_child" : {
                      "type" : "authors",
                      "query" : {
                          "term" : {
                              "last_name" : "Gormley"
                          }
                      }
                  }
              },
              {
                  "nested": {
                      "path": "categories",
                      "query" : {
                          "match": {
                              "categories.name" : "search"
                          }
                      }
                  }
              }
          ]
       }
    }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 531

{
  "took" : 79,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 3.0925694,
    "hits" : [
      {
        "_index" : "catalog",
        "_type" : "books",
        "_id" : "978-1449358549",
        "_score" : 3.0925694,
        "_source" : {
          "publisher" : "O'Reilly",
          "title" : "Elasticsearch: The Definitive Guide.  ..."
        }
      }
    ]
  }
}

可以公平地说，由Query DSL支持的Elasticsearch的搜索API非常灵活，易于使用且具有表达力。更重要的是，值得一提的是，除查询外，搜索API还支持过滤器的概念，该过滤器提供了另一种从搜索结果中排除文档的选项。

7.按查询突变

出乎意料的是（或没有）， Elasticsearch可以使用查询对索引中的文档执行诸如更新或删除之类的突变。例如，以下代码段将删除Manning出版的我们目录中所有评分较低的图书。

$ curl -i http://localhost:9200/catalog/books/_delete_by_query?pretty -d '
{
   "query": {
      "bool": {
          "must": [
              { "range" : { "rating" : { "lt" : 3 } } }
          ],
          "filter": [
             { "term" :  { "publisher" : "Manning" } }
          ]
      }
   }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 296

{
  "took" : 12,
  "timed_out" : false,
  "total" : 0,
  "deleted" : 0,
  "batches" : 0,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

它使用相同的查询DSL，并且出于说明如何使用过滤的目的，将过滤filter作为查询的一部分。但是，不返回匹配的文档，而是应用更新或删除修改。

查询删除API可用于克服常规删除API的局限性，并在子文档的父文档被删除的情况下删除子文档。

8.更好地了解您的查询

有时，您可能会发现搜索查询正在按您不期望的顺序返回文档，从而使某些文档的排名高于其他文档。为了帮助您， Elasticsearch提供了两个非常有用的API。其中之一是explain API ，它计算查询的分数说明（如果需要，还可以计算特定文档）。

可以通过将explain参数指定为查询的一部分来接收该explain ：

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
   "size": 10,
   "explain": true,
   "query": {
        "term" : {
            "publisher" : "Manning"
        }
    }
}

或使用专用的说明API端点和特定文档，例如：

$ curl -i http://localhost:9200/catalog/books/978-1617292774/_explain?pretty -d '
{
   "query": {
        "term" : {
            "publisher" : "Manning"
        }
    }
}'

由于返回了大量有用的详细信息，因此未有意将答复包括在内。 Elasticsearch的另一个非常有用的功能是验证API ，该API允许在不实际执行查询的情况下执行查询的验证，例如：

$ curl -i http://localhost:9200/catalog/books/_validate/query?pretty -d ' {
   "query": {
        "term" : {
            "publisher" : "Manning"
        }
    }                            
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 98

{
  "valid" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}

这两种API都非常有用，可以解决相关性或分析潜在影响搜索查询，而无需在实时Elasticsearch集群上执行它。

9.从搜索到见解

通常，您可能会发现自己处在搜索不足的情况下，您需要在匹配项之上进行某种汇总。很好的例子是构面（或如Elasticsearch所说的术语聚合），其中搜索结果被分组到存储桶中。

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
   "query": {
        "match" : {
            "description" : "elasticsearch"
        }
    },
    "aggs" : {
        "publisher" : {
            "terms" : { "field" : "publisher" }
        }
    }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 3447

{
  "took" : 176,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.38828257,
    "hits" : [
      {
          ...
      }
    ]
  },
  "aggregations" : {
    "publisher" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Manning",
          "doc_count" : 2
        },
        {
          "key" : "O'Reilly",
          "doc_count" : 1
        }
      ]
    }
  }
}

在此示例中，连同搜索查询，我们已要求Elasticsearch按发布者对文档进行计数。总的来说，搜索查询可以完全省略，并且仅聚合可以在请求正文中发送，例如：

$ curl -i http://localhost:9200/catalog/books/_search?pretty -d '
{
  "aggs" : {
      "authors": {
        "children": {
          "type" : "authors"
        },
        "aggs": {
          "top-authors": {
            "terms": {
            "script" : {
              "inline": "doc['first_name'].value + ' ' + doc['last_name'].value",
              "lang": "painless"
            },
            "size": 10
          }
        }
      }
    }
  }
}'

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 1031
{
  "took": 381,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1,
    "hits": [
      ...
    ]
  },
  "aggregations": {
    "authors": {
      "doc_count": 6,
      "top-authors": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "Clinton Gormley",
            "doc_count": 1
          },
          {
            "key": "Doug Turnbull",
            "doc_count": 1
          },
          {
            "key": "Matthew Lee Hinman",
            "doc_count": 1
          },
          {
            "key": "Radu Gheorghe",
            "doc_count": 1
          },
          {
            "key": "Roy Russo",
            "doc_count": 1
          },
          {
            "key": "Zachary Tong",
            "doc_count": 1
          }
        ]
      }
    }
  }
}

在这个稍微复杂一些的示例中，我们使用Elasticsearch脚本支持将顶级作者分类，以用作者的名字和姓氏组成术语：

"script" : {                           
  "inline": "doc['first_name'].value + ' ' + doc['last_name'].value",  
  "lang": "painless"                 
}

支持的聚合列表确实令人印象深刻，其中包括存储桶聚合（我们已经尝试了其中的一些聚合），指标聚合，管道聚合和矩阵聚合。仅涵盖其中一门课程的人将需要自己的教程，因此请仔细阅读它们以深入了解每一门课程的目的。

10.观看集群呼吸

Elasticsearch集群是生命中的“生物”，应该对其进行密切监视和监视，以便主动发现任何问题并Swift做出反应。我们之前见过的集群运行状况端点是获取集群总体高级状态的最简单方法。

$ http http://localhost:9200/_cluster/health

HTTP/1.1 200 OK
content-encoding: gzip
content-type: application/json; charset=UTF-8
transfer-encoding: chunked

{
    "active_primary_shards": 5,
    "active_shards": 5,
    "active_shards_percent_as_number": 20.0,
    "cluster_name": "es-catalog",
    "delayed_unassigned_shards": 0,
    "initializing_shards": 0,
    "number_of_data_nodes": 1,
    "number_of_in_flight_fetch": 0,
    "number_of_nodes": 1,
    "number_of_pending_tasks": 0,
    "relocating_shards": 0,
    "status": "red",
    "task_max_waiting_in_queue_millis": 0,
    "timed_out": false,
    "unassigned_shards": 20
}

如果群集red （如上述示例中所示），则肯定存在要解决的问题。为了帮助您， Elasticsearch提供了集群统计信息API ，集群状态API ，集群节点级别统计信息API和集群节点索引统计信息API 。

除了一点之外，还有另一组非常重要的API，即cat API 。从某种意义上说，表示形式不是JSON ，而是基于文本，具有紧凑且对齐的输出，适用于终端，它们在某种意义上是不同的。

11.结论

在本教程的这一部分中，我们通过仅使用命令行工具的RESTful API探索了Elasticsearch的许多功能。总的来说，这只是Elasticsearch通过API提供的功能的一小部分，而且官方文档是学习它们的好地方。希望在这一点上，我们对Elasticsearch足够满意，并且知道如何使用它。

12.接下来是什么

在本教程的下一部分中，我们将学习Elasticsearch必须提供给Java / JVM开发人员的多种本地API。这些API是任何利用Elasticsearch功能的Java / JVM应用程序的基本构建块。

翻译自: https://www.javacodegeeks.com/2017/02/elasticsearch-java-developers-elasticsearch-command-line.html

dnc8371

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
适用于Java开发人员的Elasticsearch：命令行中的Elasticsearch

本文是我们学院课程的一部分，该课程的标题为Java开发人员的Elasticsearch教程。在本课程中，我们提供了一系列教程，以便您可以开发自己的基于Elasticsearch的应用程序。我们涵盖了从安装和操作到Java API集成和报告的广泛主题。通过我们简单易懂的教程，您将能够在最短的时间内启动并运行自己的项目。在这里查看！ 1.简介有效，快速和准确的搜索功能是绝...
复制链接

扫一扫