Elasticsearch 快速入门

最新推荐文章于 2022-11-09 00:46:12 发布

feixiang2039

最新推荐文章于 2022-11-09 00:46:12 发布

阅读量153

点赞数

分类专栏： Elasticsearch Java 搜索文章标签： Elastic Elasticsearch

本文链接：https://blog.csdn.net/feixiang2039/article/details/100609936

版权

Java 同时被 3 个专栏收录

5 篇文章 0 订阅

订阅专栏

搜索

1 篇文章 0 订阅

订阅专栏

Elasticsearch

0 篇文章 0 订阅

订阅专栏

0. 快速了解

Elasticsearch 是一个开源的、基于 Apache Lucene 的分布式搜索引擎。使用 Lucene 需要写代码调用它的 API，比较麻烦。Elasticsearch 是对 Lucene 的封装，提供了 REST API，任何语言都可以直接调用。

ES 能够对大量数据进行全文索引，可以直接把它当作 NoSQL 数据库使用。或者，它可以从现有的系统的数据库中获取数据，然后提供搜索功能。也可以直接和现有的一些工具一起使用，比如大名鼎鼎的 ELK。

本文从零开始，讲解如何搭建自己的 Elasticsearch 搜索引擎，本文使用最新的 7.3.1 版本。

为了兼容之前的版本，本文使用的 API 可能和最新的 API 方法有点差异。主要是为了理解 Elasticsearch 的概念，之后会介绍最新的 REST API。

1. 安装

macOS: elasticsearch-7.3.1-darwin-x86_64.tar.gz

Linux: elasticsearch-7.3.1-linux-x86_64.tar.gz

Windows: elasticsearch-7.3.1-windows-x86_64.zip

下载完成后，解压，然后启动 bin 目录下的 elasticsearch：

$ tar -xvf elasticsearch-7.3.1-darwin-x86_64.tar.gz
$ cd elasticsearch-7.3.1/bin
$ ./elasticsearch

macOS 上也可以使用 brew 安装，不过 brew 默认安装的不是最新版本的。

其它的安装方法点击这里。

一个单节点的 Elasticsearch 就启动了。可以使用浏览器或者 curl 访问 localhost:9200，查看节点的信息：

$ curl localhost:9200
{
  "name" : "192.168.0.185",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "lekApH34R1G1h02NfdFCBw",
  "version" : {
    "number" : "7.3.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "4749ba6",
    "build_date" : "2019-08-19T20:19:25.651794Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

查看集群的状态：

$ curl -X GET "localhost:9200/_cat/health?v&pretty"
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1567861029 12:57:09  elasticsearch yellow          1         1      1   1    0    0        1             0                  -                 50.0%

2. 基本概念

单个 Elasticsearch 实例称为 node，多个相同 cluster name 的 node，可以组成一个集群 cluster。客户端可以连接任意一个 node 进行数据的读写。

Index 是 Elasticsearch 管理数据的单元，相当于单个数据库，Document 存放在 Index 中。每个 Index 都有自己的设置。

和 MongoDB 相似，Elasticsearch 存储的数据是 JSON 格式，每条数据称为 Document，相当于数据库的每行数据。使用 JSON 格式存储，可以有多层嵌套，而且不要求每个 Document 必须有所有的字段：

{
  "name": "Elasticsearch Denver", 
  "organizer": "Lee", 
  "location": {
    "name": "Denver, Colorado, USA",
    "geolocation": "39.7392, -104.9847" 
  }
}

同一个 Index 里面的 Document，虽然不要求有相同的结构，但是最好保持相同，这样有利于提高搜索效率。

Type 是对 Document 的分类，相当于数据库中的 table。它不像 index 存储在磁盘上，而是逻辑的虚拟分组。比如可以在一个 log 的 index 中，创建 info 和 warn 两种 type。

每个 Type 里面所有字段的定义称为映射 mapping。如果插入 Document 有新的字段，Elasticsearch 会自动判断该字段的类型，然后插入 mapping 中。注意，6.x 之后的版本，只允许一个 Index 里面有一个 Type。

Index 的数据保存在一个或多个 primary shard 中，每个 primary shard 可以多个 replica shard。primary shard 出现故障时，对应的 replica shard 会成为 primary shard。

3. Index 管理

手动新建 Index，下面创建一个名为 new-index 的 Index：

$ curl -XPUT 'localhost:9200/new-index'
{"acknowledged":true,"shards_acknowledged":true,"index":"new-index"}

查看当前节点的所有 index：

$ curl -XGET 'http://localhost:9200/_cat/indices?v'
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   get-together eaONKb_vQnCgcwswbQjOqw   1   1          1            0      4.1kb          4.1kb
yellow open   new-index    Knz7TScFRneOGWLNRTV8TQ   1   1          0            0       283b           283b

可以在 data 文件夹下，找对以每个 Index 的 uuid 命名的文件夹，存储了 Index 里面的数据。

插入新的 Document，如果该 Index 不存在，同时会自动创建 Index，下面会介绍。

删除 Index：

$ curl -XDELETE 'localhost:9200/new-index'
{"acknowledged" : true}

4. 数据管理

新增记录

向指定的 /Index/Type 发送 PUT 请求，就可以在 Index 里面新增一条记录。

$ curl -XPUT -H 'Content-type:application/json' 'localhost:9200/get-together/group/1?pretty' -d '{ 
  "name": "Elasticsearch Denver", 
  "organizer": "Lee" 
}'
{
  "_index" : "get-together",
  "_type" : "group",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

返回的结果中，_version 是版本号，每次修改的时候，会自增。

上面新增记录时，还会同时自动创建 get-together Index，自动为 group Type 创建 mapping。

查看自动创建的 mapping 信息：

$ curl 'localhost:9200/get-together/_mapping?pretty'
{
  "get-together" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "organizer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

新增记录时，也可以不指定 ID，这时需要使用 POST 请求：

$ curl -XPOST -H 'Content-type:application/json' 'localhost:9200/get-together/group?pretty' -d '{
  "name": "Elastic Engineer",
  "organizer": "Kyle", 
  "location":"China"
}'
{
  "_index" : "get-together",
  "_type" : "group",
  "_id" : "VNW1C20BXEA3tJj8u70A",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

返回的结果中，_id 就是自动生成的字符串。

上面还添加了一个新的 location 字段，Elasticsearch 会自动插入到 mapping 中。

查看记录

通过 ID 查看：

$ curl 'localhost:9200/get-together/group/1?pretty'
{
  "_index" : "get-together",
  "_type" : "group",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Elasticsearch Denver",
    "organizer" : "Lee"
  }
}

如果不存在的话，found 是 false：

$ curl 'localhost:9200/get-together/group/2?pretty'
{
  "_index" : "get-together",
  "_type" : "group",
  "_id" : "2",
  "found" : false
}

更新记录

和新增记录一样，重新发送一次 PUT 请求，就会更新记录：

$ curl -XPUT -H 'Content-type:application/json' 'localhost:9200/get-together/group/1?pretty' -d '{
  "name": "Elasticsearch Denver",
  "organizer": "Matt"
}'
{
  "_index" : "get-together",
  "_type" : "group",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

更新之后， _version 会自增，result 是 updated。

删除记录

发送 DELETE 请求，就会删除对应的记录：

$ curl -XDELETE 'localhost:9200/get-together/group/1'

5. 数据查询

不带任何参数，查询所有的记录：

$ curl 'localhost:9200/get-together/group/_search?pretty'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "group",
        "_id" : "VNW1C20BXEA3tJj8u70A",
        "_score" : 1.0,
        "_source" : {
          "name" : "Elastic Engineer",
          "organizer" : "Kyle",
          "location" : "China"
        }
      },
      {
        "_index" : "get-together",
        "_type" : "group",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "Elasticsearch Denver",
          "organizer" : "Matt"
        }
      }
    ]
  }
}

took 表示耗时多少毫秒，timed_out 表示是否超时，_shards 表示查询多少个 shard。

$ curl "localhost:9200/get-together/group/_search?q=elasticsearch&size=1&pretty"
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "group",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "Elasticsearch Denver",
          "organizer" : "Matt"
        }
      }
    ]
  }
}

如果需要指定查询的字段，比如查询 name 中包含 elasticsearch，可以这样： q=name:elasticsearch 。

不指定 type，相当于在所有的 type 中查询：

$ curl 'localhost:9200/get-together/_search?q=elasticsearch&pretty'

从多个 index 中查询：

curl 'localhost:9200/get-together,new-index/_search?q=lee&pretty'

从所有的 index 中查询：

$ curl 'localhost:9200/_search?q=elasticsearch&pretty'

上面是基于 URL 的查询，下面是使用另一种方式：

$ curl -H 'Content-type:application/json' 'localhost:9200/get-together/group/_search?pretty' -d '{ 
	"query": { 
		"query_string": { 
			"query": "elasticsearch" 
		} 
	} 
}'
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "group",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "Elasticsearch Denver",
          "organizer" : "Matt"
        }
      }
    ]
  }
}

默认查询所有字段，在 query_string 中使用 "default_field": "name" 指定要查询的字段。

如果查询多个词语，默认是匹配任意一个，如果想要全匹配，在 query_string 中使用 "default_operator": "AND"。

$ curl 'localhost:9200/get-together/group/_search?pretty' -d '{ 
	"query": { 
		"query_string": { 
			"query": "elasticsearch san francisco", 
			"default_field": "name", 
			"default_operator": "AND" 
		} 
	} 
}'

也可以直接在 query 中指定："query": "name:elasticsearch AND name:san AND name:francisco"

query_string 有非常强大的功能，它是从 Lucene 继承来的。

也可以使用 term 查询：

$ curl -H 'Content-type:application/json' 'localhost:9200/get-together/group/_search?pretty' -d '{ 
	"query": { 
		"term": { 
			"name": "elasticsearch" 
		} 
	} 
}'

Elasticsearch 提供了非常多的查询 API，我们之后再详细介绍。

在这里插入图片描述