Elasticsearch-7.1.x学习笔记

最新推荐文章于 2025-01-07 20:05:20 发布

醉殇无痕

最新推荐文章于 2025-01-07 20:05:20 发布

阅读量1w

点赞数 6

分类专栏：大数据 # ES 文章标签： elasticsearch 大数据 ES

本文链接：https://blog.csdn.net/qq_40235064/article/details/90766512

版权

本文详细介绍了Elasticsearch 7.1.x的安装、配置、REST操作、核心概念及Java客户端的使用。包括单节点安装中的常见问题、安装head插件、RESTful接口的GET、PUT、POST、DELETE操作、索引库的创建与更新、分片和副本的概念、Java High Level REST Client的API使用、搜索类型、聚合查询、分页和多索引查询等内容，深入解析了Elasticsearch在大数据场景下的应用。

摘要由CSDN通过智能技术生成

文章目录

软件版本
jdk-8u192-linux-x64.tar.gz
elasticsearch-7.1.0-linux-x86_64.tar.gz

1. 单节点安装

下载
	# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.1.0-linux-x86_64.tar.gz
	
解压
	# tar -zxvf elasticsearch-7.1.0-linux-x86_64.tar.gz -C /opt/tools/elk/
	# tar -zxvf jdk-8u192-linux-x64.tar.gz -C /opt/tools/

配置环境变量（root用户下）
	# vi /etc/profile

	# set jdk path
	export JAVA_HOME=/opt/tools/jdk1.8.0
	
	#set es path
	export 	ES_HOME=/opt/tools/elk/elasticsearch-7.1.0
	
	export PATH=$PATH:$JAVA_HOME/bin:$ES_HOME/bin

使配置文件生效
	# source /etc/profile

查看
	# echo $JAVA_HOME
	/opt/tools/jdk1.8.0
	
	# echo $ES_HOME
	/opt/tools/elk/elasticsearch-7.1.0

修改配置文件$ES_HOME/config/elasticsearch.yml

network.host: 192.168.93.252    #设置当前主机ip

启动

# bin/elasticsearch
# bin/elasticsearch -d （后台启动）

报错

问题一

[WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [bigdatademo]
uncaught exception in thread [main]org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elast
icsearch as root

原因：elasticsearch不能以root账户启动
解决方案：新建一个普通用户

# useradd yskang
# passwd yskang
Changing password for user yskang.
New password: 
BAD PASSWORD: it is too simplistic/systematic
BAD PASSWORD: is too simple
Retype new password: 
passwd: all authentication tokens updated successfully.

授权：
# chown  -R yskang:yskang /opt/*
然后切换到yskang用户，重新启动elasticsearch

使普通用户具有root用户权限，sudo命令权限（通过which查看命令所在）
root用户通过visudo去修改

  # visudo
  添加以下内容：
  ## Allow root to run any commands anywhere
  root   			 ALL=(ALL)       						ALL
  yskang   		ALL=(ALL)								NOPASSWD: ALL
    							ALL
  说明：用户名	IP或者网段=（身份）也可以不写，默认是root		可执行的命令
  
  使用方法
  $ sudo service iptables status

问题二

java.lang.UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CON
FIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in	at org.elasticsearch.bootstrap.SystemCallFilter

原因：报了一大串错误，不必惊慌，其实只是一个警告，主要是因为Linux的版本过低造成的
解决方案：（1）重新安装新版本的Linux系统；（2）警告不影响使用，可以忽略

问题三

ERROR:  bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]

原因：无法创建本地文件问题，用户最大可创建文件数太小

解决方案：
切换到root用户下，编辑 /etc/security/limits.conf，追加以下内容；

# vi /etc/security/limits.conf
* soft nofile 65536
* hard nofile 262144
* soft nproc 32000
* hard nproc 32000

问题四

[2]:  max number of threads [1024] for user [yskang] is too low, increase to at least [4096]

原因：无法创建本地线程问题，用户最大可创建线程数太小
解决方案：切换到root用户下，编辑 /etc/security/limits.d/90-nproc.conf

# vi /etc/security/limits.d/90-nproc.conf
找到
* soft nproc 1024 
修改为：
* soft nproc 4096

问题五

[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

原因：最大虚拟内存太小
解决方案：切换到root用户下，编辑 /etc/sysctl.conf，追加以下内容：

vm.max_map_count=655360

保存后，执行命令（使配置生效），然后重新启动

# sysctl -p

问题六

[4]: system call filters failed to install; check the logs and fix your configuration or di
sable system call filters at your own risk

原因: 因为Centos6不支持SecComp,而ES默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能解决方案:修改elasticsearch.yml 添加以下内容

$ vi elasticsearch.yml 
bootstrap.memory_lock: false   #设置ES节点允许内存交换
bootstrap.system_call_filter: false   #禁用系统调用过滤器

问题七

[5]: the default discovery settings are unsuitable for production use; at least one of [dis
covery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

原因：默认发现设置不适合生产使用;必须至少配置[dis covery.seed_hosts，discovery.seed_providers，cluster.initial_master_nodes]中的一个
解决方案：修改elasticsearch.yml文件

$ vi elasticsearch.yml
将 #cluster.initial_master_nodes: ["node-1", "node-2"] 
去掉注释#并修改为 
cluster.initial_master_nodes: ["bigdatademo"]
说明：bigdatademo：当前节点主机名，记得保存。

启动完成后，验证服务是否开启成功

$ curl http://192.168.93.252:9200
{
 "name" : "bigdatademo",
 "cluster_name" : "elasticsearch",
 "cluster_uuid" : "_na_",
 "version" : {
   "number" : "7.1.0",
   "build_flavor" : "default",
   "build_type" : "tar",
   "build_hash" : "606a173",
   "build_date" : "2019-05-16T00:43:15.323135Z",
   "build_snapshot" : false,
   "lucene_version" : "8.0.0",
   "minimum_wire_compatibility_version" : "6.8.0",
   "minimum_index_compatibility_version" : "6.0.0-beta1"
 },
 "tagline" : "You Know, for Search"
}

2. ES安装head插件

安装nodejs npm
# yum -y install nodejs npm

直接yum install -y nodejs会提示找不到nodejs这个模块

安装nodesource后再执行yum install -y nodejs
$ curl --silent --location https://rpm.nodesource.com/setup_10.x | sudo bash -

然后
$ sudo yum -y install nodejs
会将npm一起安装的

查看版本信息
$ node -v
v10.16.0
$ npm -v
6.9.0


安装git
$ sudo yum -y install git

下载head
$ git clone git://github.com/mobz/elasticsearch-head.git
$ cd elasticsearch-head
$ npm install
报错：node npm install Error:CERT_UNTRUSTED
ssl验证问题：使用下面命令取消ssl验证即可解决
npm config set strict-ssl false

配置head插件
修改Gruntfile.js配置，增加hostname: '*'配置

$ vi Gruntfile.js

connect: {
    server: {
        options: {
            port: 9100,
            base: '.',
            keepalive: true,
            hostname: '*'
        }
    }
}

修改head/_site/app.js文件
修改head连接es的地址（修改localhost为本机的ip地址）

$ vi app.js
this.base_uri = this.config.base_uri || this.prefs.get("app-base_uri") || "http://192.168.93.252:9200";

ES 配置
修改elasticsearch.yml ,增加跨域的配置（需要重启es才能生效）

$ vi config/elasticsearch.yml
http.cors.enabled: true
http.cors.allow-origin: "*"

启动head插件

$ cd elasticsearch-head/node_modules/grunt/
$ bin/grunt server &
查看进程
$ netstat -ntlp

head 查看es集群状态
http://192.168.93.252:9100
ES成功连接head

3. Elasticsearch Rest基本操作

REST介绍

REST定义
REST即表述性状态传递（英文：Representational State Transfer，简称REST）是Roy Fielding博士在2000年他的博士论文中提出来的一种软件架构风格。
它是一种针对网络应用的设计和开发方式，可以降低开发的复杂性，提高系统的可伸缩性。
REST指的是一组架构约束条件和原则。满足这些约束条件和原则的应用程序或设计就是RESTful
Web应用程序最重要的REST原则是：客户端和服务器之间的交互在请求之间是无状态的。从客户端到服务器的每个请求都必须包含理解请求所必需的信息。如果服务器在请求之间的任何时间点重启，客户端不会得到通知。此外，无状态请求可以由任何可用服务器回答，这十分适合云计算之类的环境。客户端可以缓存数据以改进性能。
在服务器端，应用程序状态和功能可以分为各种资源。每个资源都是要URI(Universal Resource Identifier)得到一个唯一的地址。所有资源都共享统一的界面，以便在客户端和服务器之间传输状态。使用的是标准的HTTP方法，比如：GET、PUT、POST和DELETE。

REST资源

资源	GET	PUT	POST	DELETE
一组资源的URL 如： http://example.com/products/	列出URL列表	使用给定的一组资源替换当前组资源	在本资源组中创建或者追加一个新的资源	删除整组资源
单个资源的URL 如： http://example.com/products/1234	获取指定资源的详细信息	替换或者创建指定资源	在资源组下创建或者追加一个新的元素	删除指定的元素

REST基本操作

方法	作用
GET	获取对象的当前状态
PUT	改变对象的状态
POST	创建对象
DELETE	删除对象
HEAD	获取头信息

ES内置的常用REST接口

URL	说明
/index/_search	搜索指定索引下的数据
/_aliases	获取或者操作索引下的别名
/index/	查看指定索引下的详细信息
/index/type/	创建或者操作类型
/index/mapping	创建或者操作mapping
/index/settings	创建或者操作settings
/index/_open	打开指定索引
/index/_close	关闭指定索引
/index/_refresh	刷新索引（使新增加内容对搜索可见，不保证数据被写入磁盘）
/index/_flush	刷新索引（会触发Lucene提交数据）

CURL命令
简单认为是可以在命令行下访问url的一个工具
curl是利用URL语法在命令行方式下工作的开源文件传输工具，使用curl可以简单实现常见的get/post请求

CURL的使用

-X  指定http请求的方法，GET POST PUT DELETE
-d  指定要传输的参数

CURL创建索引库

示例：
如：索引库名称：test
$ curl -XPUT 'http://192.168.93.252:9200/test/'
PUT/POST都可以
显示以下内容表示创建索引库成功
{"acknowledged":true,"shards_acknowledged":true,"index":"test"}

创建数据
$ curl -XPOST http://192.168.93.252:9200/test/user/1 -d'{"name":"jack","age":26}'
{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}

高版本的ES需要指定头文件信息，否则会报错，低版本可以不用指定
$ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/1 -d'{
"name":"jack","age":26}'
{"_index":"test","_type":"user","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0
,"_primary_term":1}

PUT和POST的用法区别
PUT是幂等方法，二POST不是。所以PUT用于更新操作、POST用于新增操作比较合适
PUT、DELETE操作是幂等的。所谓幂等是指不管进行多少次操作，结果都一样。
POST操作不是幂等的，因此会出现POST重复加载的问题，比如，当多次发出同样的POST请求之后，结果会创建若干的资源
创建操作可以使用POST，也可以使用PUT，区别在于POST是作用在一个集合资源之上的（/articles），而PUT是作用在一个具体资源之上的（/articles/123），比如很多资源使用数据库自增主键作为标识信息，而创建的资源标识信息到底是什么只能由服务端提供，这个时候就必须使用POST

创建索引库的注意事项
索引库名称必须要全部小写，不能以下划线开头，也不能包含逗号
如果没有明确指定索引数据的ID，那么ES会自动生成一个随机的ID，需要使用POST参数

$ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/ -d'{
"name":"john","age":18}'

创建全新内容的两种方式
（1）使用自增ID（post）

$ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/ -d'{
"name":"john","age":18}'

（2）在url后面添加参数

$ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/2?op_type=create -d'{
"name":"lucy","age":15}'

$ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/3/_create -d'{
"name":"alan","age":58}'

查询索引-GET

（1）根据id查询

$ curl -XGET http://192.168.93.252:9200/test/user/1

{"_index":"test","_type":"user","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"name":"jack","age":26}}

（2）在任意的查询字符串中添加pretty参数，ES可以得到易于识别的json结果

(1) 检索文档中的一部分，如果只需要显示指定字段
$ curl -XGET 'http://192.168.93.252:9200/test/user/1?_source=name&pretty'
{
  "_index" : "test",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "jack"
  }
}

(2) 查询指定索引库指定类型所有数据
$ curl -XGET http://192.168.93.252:9200/test/user/_search?pretty
{
  "took" : 80,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "jack",
          "age" : 26
        }
      },
      {
        "_index" : "test",
        "_type" : "user",
        "_id" : "zeQEIWsBWJbm70w3S4EC",
        "_score" : 1.0,
        "_source" : {
          "name" : "john",
          "age" : 18
        }
      },
      {
        "_index" : "test",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "lucy",
          "age" : 15
        }
      },
      {
        "_index" : "test",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "alan",
          "age" : 58
        }
      }
    ]
  }
}

（3）根据条件进行查询

$ curl -XGET 'http://192.168.93.252:9200/test/user/_search?q=name:john&pretty=true'
或者
$ curl -XGET 'http://192.168.93.252:9200/test/user/_search?q=name:john&pretty'
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.2039728,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "user",
        "_id" : "zeQEIWsBWJbm70w3S4EC",
        "_score" : 1.2039728,
        "_source" : {
          "name" : "john",
          "age" : 18
        }
      }
    ]
  }
}

DSL查询

DSL（Domain Specific Language）领域特定语言
新添加一个文档

$ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/4/_create -d'{"name":"zhangsan","age":18}'
{"_index":"test","_type":"user","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4
,"_primary_term":1}

$ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/test/user/_search -d'{"query":{"match":{"name":"zhangsan"}}}'
{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"
max_score":1.3862944,"hits":[{"_index":"test","_type":"user","_id":"4","_score":1.3862944,"_source":{"name":"zhangsan","age":18}}]}}

MGET查询

使用mget API 获取多个文档
先新建一个索引库test2

$ curl -XPUT 'http://192.168.93.252:9200/test2/'

$ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test2/user/1 -d'{"name":"marry","age":16}'

$ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/_mget?pretty -d'{"docs":[{"_index":"test","_type":"user","_id":2,"_source":"name"},{"_index":"test2","_type":"user","_id":1}]}'
{
  "docs" : [
    {
      "_index" : "test",
      "_type" : "user",
      "_id" : "2",
      "_version" : 1,
      "_seq_no" : 2,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "lucy"
      }
    },
    {
      "_index" : "test2",
      "_type" : "user",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 0,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "marry",
        "age" : 16
      }
    }
  ]
}

如果需要的文档在同一个_index或者同一个_type中，你就可以在URL中指定一个默认的/_index或者/_index/_type

$ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/test/user/_mget?pretty -d'{
"docs":[{"_id":1},{"_id":2}]}'
{
  "docs" : [
    {
      "_index" : "test",
      "_type" : "user",
      "_id" : "2",
      "_version" : 1,
      "_seq_no" : 2,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "lucy",
        "age" : 15
      }
    },
    {
      "_index" : "test",
      "_type" : "user",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 0,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "jack",
        "age" : 26
      }
    }
  ]
}

如果所有的文档拥有相同的_index以及_type中，直接在请求中添加ids的数组即可

$ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/test/user/_mget?pretty -d'{
"ids":["1","2"]}'
{
  "docs" : [
    {
      "_index" : "test",
      "_type" : "user",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 0,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "jack",
        "age" : 26
      }
    },
    {
      "_index" : "test",
      "_type" : "user",
      "_id" : "2",
      "_version" : 1,
      "_seq_no" : 2,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "name" : "lucy",
        "age" : 15
      }
    }
  ]
}

HEAD的使用

如果只想检查一下文档是否存在，你可以使用HEAD来替代GET方法，这样就只会返回HTTP头文件

$ curl -i -XHEAD http://192.168.93.252:9200/test/user/1
HTTP/1.1 200 OK
Warning: 299 Elasticsearch-7.1.0-606a173 "[types removal] Specifying types in document get requests is deprecated, use the /{index}/_
doc/{id} endpoint instead."content-type: application/json; charset=UTF-8
content-length: 133

$ curl -i -XHEAD http://192.168.93.252:9200/test/user/5
HTTP/1.1 404 Not Found
Warning: 299 Elasticsearch-7.1.0-606a173 "[types removal] Specifying types in document get requests is deprecated, use the /{index}/_
doc/{id} endpoint instead."content-type: application/json; charset=UTF-8
content-length: 56

ES更新索引

ES可以使用PUT或者POST对文档进行更新（全部更新）操作，如果指定ID的文档已经存在，则执行更新操作
注意：ES在执行更新操作的时候，首先会将旧的文档标记为删除状态，然后添加新的文档，旧的文档不会立即消失，但是你也无法访问，ES会在你继续添加更多数据的时候在后台清理已经标记为删除状态的文档

局部更新，可以添加新字段或者更新已有字段（必须使用POST）

$ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/1/_update -d'{"doc":{"name":"baby","age":18}}'
{"_index":"test","_type":"user","_id":"1","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":5
,"_primary_term":1}

$ curl -XGET http://192.168.93.252:9200/test/user/1?pretty
{
  "_index" : "test",
  "_type" : "user",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 5,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "baby",
    "age" : 18
  }
}

ES删除索引

ES可以使用DELETE对文档进行删除操作

$ curl -XDELETE http://192.168.93.252:9200/test/user/1
{"_index":"test","_type":"user","_id":"1","_version":3,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":6
,"_primary_term":1}
说明：如果文档存在，result属性值为deleted，_version属性的值+1

$ curl -XDELETE http://192.168.93.252:9200/test/user/1
{"_index":"test","_type":"user","_id":"1","_version":4,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no"
:11,"_primary_term":1}
说明：如果文档不存在，result属性值为not_found,但是_version属性的值依然会+1，这个就是内部管理的一部分，它保证了我们在多个节点间的不同操作的顺序都被正确的标记了

$ curl -XGET http://192.168.93.252:9200/test/user/1
{"_index":"test","_type":"user","_id":"1","found":false}

注意：ES在执行删除操作时也不会立即生效，它只是被标记成已删除。ES将会在你之后添加更多索引的时候才会在后台进行删除内容的清理

ES批量操作-bulk

bulk API可以帮助我们同时执行多个请求
格式：

action: index/create/update/delete
metadata: _index，_type，_id
request body: _source(删除操作不需要)
	{action:{metadata}}
	{request body}
	{action:{metadata}}
	{request body}

create和index的区别
如果数据存在，使用create操作失败，会提示文档已经存在，使用index则可以成功执行

使用文件的方式
新建一个requests文件

$ vi requests
{"index":{"_index":"test","_type":"user","_id":"6"}}
{"name":"mayun","age":51}
{"update":{"_index":"test","_type":"user","_id":"6"}}
{"doc":{"age":52}}

执行批量操作

$ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/_bulk --data-binary @requests
{"took":31,"errors":false,"items":[{"index":{"_index":"test","_type":"user","_id":"6","_version":1,"result":"created","_shards":{"tot
al":2,"successful":1,"failed":0},"_seq_no":12,"_primary_term":1,"status":201}},{"update":{"_index":"test","_type":"user","_id":"6","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":13,"_primary_term":1,"status":200}}]}

$ curl -XGET http://192.168.93.252:9200/test/user/6?pretty
{
  "_index" : "test",
  "_type" : "user",
  "_id" : "6",
  "_version" : 2,
  "_seq_no" : 13,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "mayun",
    "age" : 52
  }
}

bulk请求可以在URL中声明/_index 或者 /_index/_type
bulk一次最大处理多少数据量
(1) bulk会把将要处理的数据载入内存中，所以数据量是有限的
(2) 最佳的数据量不是一个确定的数值，它取决于你的硬件，你的文档大小以及复杂性，你的索引以及搜索的负载
(3) 一般建议是1000-5000个文档，如果你的文档很大，可以适当减少队列，大小建议是5-15MB，默认不能超过100MB，可以在ES的配置文件中修改这个值 http.max_content_length: 100mb
http.max_content_length ：The max content of an HTTP request. Defaults to 100mb.
(4) 官网说明：https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html

ES版本控制

（1）普通关系型数据库使用的是（悲观并发控制（PCC））
当我们在修改一个数据前先锁定这一行，然后确保只有读取到数据的这个线程可以修改这一行数据
（2）ES使用的是（乐观并发控制（OCC））
ES不会阻止某一数据的访问，然而，如果基础数据在我们读取和写入的间隔中发生了变化，更新就会失败，这时候就由程序来决定如何处理这个冲突。它可以重新读取新数据来进行更新，又或者将这一情况直接反馈给客户
（3）ES如何实现版本控制（使用ES内部版本号）
首先得到需要修改的文档，获取版本号（_version）

$ curl -XGET http://192.168.93.252:9200/test/user/2
{"_index":"test","_type":"user","_id":"2","_version":1,"_seq_no":2,"_primary_term":1,"found":true,"_source":{
"name":"lucy","age":15}}

在执行更新操作的时候把版本号传过去

$ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/2/_update?version=1 -d'{"doc":{"age":30}}'
{"_index":"test","_type":"user","_id":"2","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1
4,"_primary_term":1}

$ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/2?version=2 -d'{"name":"joy","age":20}'

如果传递的版本号和待更新的文档的版本号不一致，则会更新失败

4. Elasticsearch 核心概念

Cluster

（1）代表一个集群，集群中有多个节点，其中有一个为主节点，这个主节点是可以通过
选举产生的，主从节点是对于集群内部来说的。ES的一个概念就是去中心化。

（2）主节点的职责是负责管理集群状态，包括管理分片的状态和副本的状态，以及节点的发现和删除。

（3）注意：主节点不负责对数据的增删改查请求进行处理，只负责维护集群的相关状态信息。

集群状态查看：http://192.168.93.252:9200/_cluster/health?pretty

{
   
	cluster_name: "elasticsearch",
	status: "yellow",
	timed_out: false,
	number_of_nodes: 1,
	number_of_data_nodes: 1,
	active_primary_shards: 2,
	active_shards: 2,
	relocating_shards: 0,
	initializing_shards: 0,
	unassigned_shards: 2,
	delayed_unassigned_shards: 0,
	number_of_pending_tasks: 0,
	number_of_in_flight_fetch: 0,
	task_max_waiting_in_queue_millis: 0,
	active_shards_percent_as_number: 50
}

Shards

（1）代表索引分片，ES可以把一个完整的索引分成多个分片，好处是可以把一个大的索引水平拆分成多个，
分布到不同的节点上，构成分布式搜索，提高性能和吞吐量

（2）分片的数量只能在创建索引库时指定，索引库创建后不能更改。

curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test3/' -d'{"settings":{"number_of_shards":3}}'

默认一个索引库有5个分片（7.0之前），本版本7.1.0默认只有一个分片和一个副本
每个分片中最多存储2,147,483,519条数据

官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-concepts.html

To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards).
总而言之，每个索引可以拆分为多个分片。索引也可以复制为零（表示没有副本）或更多次。复制后，每个索引都将具有主分片（从中复制的原始分片）和副本分片（主分片的副本）。

The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may also change the number of replicas dynamically anytime. You can change the number of shards for an existing index using the _shrink and _split APIs, however this is not a trivial task and pre-planning for the correct number of shards is the optimal approach.
可以在创建索引时为每个索引定义分片和副本的数量。创建索引后，您还可以随时动态更改副本数。您可以使用_shrink和_split API更改现有索引的分片数，但这不是一项简单的任务，并且预先计划正确数量的分片是最佳方法。

By default, each index in Elasticsearch is allocated one primary shard and one replica which means that if you have at least two nodes in your cluster, your index will have one primary shard and another replica shard (one complete replica) for a total of two shards per index.
默认情况下，Elasticsearch中的每个索引都分配了一个主分片和一个副本，这意味着如果群集中至少有两个节点，则索引将具有一个主分片和另一个副本分片（一个完整副本），总共两个每个索引的分片。

Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API.
每个Elasticsearch分片都是Lucene索引。单个Lucene索引中可以包含最多文档数。截至LUCENE-5843，限制为2,147,483,519（= Integer.MAX_VALUE - 128）个文件。您可以使用_cat / shards API监视分片大小。

Replicas

代表索引副本，ES可以给索引分片设置副本

副本的作用：
一是提高系统的容错性，当某个节点某个分片损坏或丢失时可以从副本中恢复
二是提高ES的查询效率，ES会自动对搜索请求进行负载均衡

副本的数量可以随时修改，可以在创建索引库的时候指定

curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test3/' -d'{"settings":{"number_of_replicas":3}}'

默认是一个分片有1个副本 -> index.number_of_replicas: 1

注意：主分片和副本不会存在一个节点中

Recovery

代表数据的恢复或叫数据重新分布，ES在有节点加入或退出时会根据机器的负载对索引分片进行重新分配，挂掉的节点重新启动时也会进行数据恢复

Gateway

代表ES索引的持久化存储方式，ES默认是先把索引存放到内存中，当内存满了时再持久化到硬盘。当这个ES集群关闭再重新启动时就会从Gateway中读取索引数据。
ES支持多种类型的Gateway：
本地文件系统（默认）
分布式文件系统
Hadoop的HDFS和Amazon的S3云存储服务

Discovery.zen

代表ES的自动发现节点机制，ES是一个基于p2p的系统，它先通过广播寻找存在的节点，再通过多播协议来进行节点之间的通信，同时也支持点对点的交互

如果是不同网段的节点如何组成ES集群？
禁用自动发现机制
discovery.zen.ping.multicast.enabled: false

设置新节点被启动时能够发现的主节点列表
discovery.zen.ping.unicast.hosts: [“192.168.93.252”,“192.168.93.251”,“192.168.93.250”]

Transport

代表ES内部节点或集群与客户端的交互方式，默认内部是使用tcp协议进行交互，同时它支持http协议（json格式）、thrift、servlet、memcached、zeroMQ等其他的传输协议（通过插件方式集成）

Create Index

官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html

Create Index API用于在Elasticsearch中手动创建索引。Elasticsearch中的所有文档都存储在一个索引或另一个索引中。

最基本的命令如下：

PUT twitter

这将创建一个名为twitter的索引，其默认设置为all

索引命名限制
（1）仅限小写
（2）不能包括\，/，*，？，“，<，>，|，``（空格字符），逗号，＃
（3）7.0版本之前的索引可能包含冒号（：)，但已被弃用，7.0+不支持
（4）不能以- ，_，+开头
（5）不能是.或者..
（6）不能超过255个字节（注意它是字节，因此多字节字符将更快地计入255个限制）

Index Settings
创建的每个索引都可以具有与其关联的特定设置，如：

PUT twitter
{
   
    "settings" : {
   
        "index" : {
   
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    }
}

分片数：number_of_shards默认值为1
副本数：number_of_replicas默认值为1

也可以简化

PUT twitter
{
   
    "settings" : {
   
        "number_of_shards" : 3,
        "number_of_replicas" : 2
    }
}

注意：您不必在settings部分中明确指定索引部分

查看索引库的settings信息

curl -XGET http://192.168.93.252:9200/test/_settings?pretty

操作不存在的索引（创建）

curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test4/' -d'{"settings":{"number_of_shards":3,"number_of_replicas":2}}'

操作已存在索引（修改）

curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test4/_settings' -d'{"index":{"number_of_replicas":2}}'

Mapping

官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Mapping是定义文档及其包含的字段的存储和索引方式的过程。例如，使用Mapping来定义：
（1）应将哪些字符串字段视为全文字段。
（2）哪些字段包含数字，日期或地理位置。
（3）日期值的格式。
（4）用于控制动态添加字段的映射的自定义规则。

Mapping Type

每个索引都有一种映射类型，用于确定文档的索引方式。

映射类型具有：
（1）Meta-fields（元字段）
元字段用于自定义文档的元数据关联的处理方式。元字段的示例包括文档的_index，_type，_id和_source字段。

（2）Fields or properties（字段或属性）
映射类型包含与文档相关的字段（fields ）或属性（properties ）列表。

每个字段都有一个数据类型，可以是：
（1）一个简单的类型，如text，keyword，date，long，double，boolean或ip。
（2）支持JSON的分层特性的类型，如对象或嵌套
（3）或者像geo_point，geo_shape或completion这样的特殊类型。

创建索引时可以指定映射，如下所示：

PUT my_index 
{
   
  "mappings": {
   
    "properties": {
    
      "title":    {
    "type": "text"  }, 
      "name":     {
    "type": "text"  }, 
      "age":      {
    "type": "integer" },  
      "created":  {
   
        "type":   "date", 
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}
说明：
	创建一个名为my_index的索引
	指定映射中的字段或属性
	指定标题字段包含文本值
	指定名称字段包含文本值
	指定age字段包含整数值
	指定创建的字段包含两种可能格式的日期值

查询索引库的mapping信息

curl -XGET http://192.168.93.252:9200/test/user/_mapping?pretty

操作不存在的索引（创建）

curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test5/' -d'{"mappings": {"user": {"properties": { "name": { "type": "text"},"age": { "type": "integer" }}}}}'

操作已存在索引（修改）

curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test5/user/_mapping' -d'{"properties": { "name": { "type": "text"},"age": { "type": "integer" }}}'

5. Elasticsearch Java 客户端

Java High Level REST Client Java高级客户端

Java API:7.1 官网地址
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html

添加maven依赖

<!-- junit5 -->
<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-api</artifactId>
    <version>5.4.2</version>
    <scope>test</scope>

最低0.47元/天解锁文章