一 ES基础知识介绍
Near Reamtime(NRT)
Elasticsearch 是一个实时的查询平台,从索引数据开始到索引数据结束几乎是1s的时间
Cluster
集群是一个或多个节点的集合,这些节点一起保存数据,并在所有节点上提供联合索引和搜索功能,集群通过唯一的名称(默认为elasticsearch)加以标记,同一个节点只能加入一个集群中。
Node
节点是集群中用来存储数据、参与集群中索引和搜索功能的一台单独的服务器,节点在启动时默认会分配一个随机的UUID作为集群中该节点的身份标识,该名称可以手动在配置文件中指定。
Index
Elasticsearch | RDBMS |
Indices(indices1,indices2,indices3...) | Database(database1,database2,database3...) |
Types | Tables |
Documents | Rows |
Fileds | Columns |
Elasticsearch5.5下载
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.0.tar.gz
JDK1.8下载
二 安装Elasticsearch
1. 新建用户,并为用户配置环境变量
[root@sht-sgmhadoopdn-04 ~]# groupadd -r dba [root@sht-sgmhadoopdn-04 ~]# useradd -r elsearch -g dba -d /home/elsearch [root@sht-sgmhadoopdn-04 ~]# cat /home/elsearch/.bash_profile # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs export JAVA_HOME=/usr/java/jdk1.8.0_111 export ES_HOME=/usr/local/elasticsearch export PATH=$JAVA_HOME/bin:$ES_HOME/bin:$PATH
2. 系统和内核参数优化
[root@sht-sgmhadoopdn-04 ~]# echo "elsearch soft nproc 8192" >> /etc/security/limits.conf [root@sht-sgmhadoopdn-04 ~]# echo "elsearch hard nproc 16384" >> /etc/security/limits.conf [root@sht-sgmhadoopdn-04 ~]# echo "elsearch soft nofile 4096" >> /etc/security/limits.conf [root@sht-sgmhadoopdn-04 ~]# echo "elsearch hard nofile 65536" >> /etc/security/limits.conf [root@sht-sgmhadoopdn-04 ~]# echo "elasticsearch soft memlock unlimited" >> /etc/security/limits.conf [root@sht-sgmhadoopdn-04 ~]# echo "elasticsearch hard memlock unlimited" >> /etc/security/limits.conf [root@sht-sgmhadoopdn-04 ~]# echo "vm.max_map_count=655360" >> /etc/sysctl.conf [root@sht-sgmhadoopdn-04 ~]# sysctl -p
注:如果不进行参数优化,启动elasticsearch时候可能会报如下错误
[2018-09-07T09:55:27,564][INFO ][o.e.b.BootstrapChecks ] [sht-sgmhadoopdn-04] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks ERROR: [2] bootstrap checks failed [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] [2018-09-07T09:55:27,572][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] stopping ... [2018-09-07T09:55:27,604][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] stopped [2018-09-07T09:55:27,604][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] closing ... [2018-09-07T09:55:27,613][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] closed
注:运行elasticsearch的用户最好为elsearch,我曾试着用mysqladmin用户运行elasticsearch,但是会报java版本不匹配的错,实际上我的mysqladmin用户的环境变量里已经配置了java,并且已经生效,但是elasticsearch还是读取系统全局的变量,暂时不清楚原因,可以理解为一个未知的bug吧
[mysqladmin@sht-sgmhadoopdn-04 bin]$ ./elasticsearch --help Elasticsearch requires at least Java 8 but your Java version from /bin/java does not meet this requirement [mysqladmin@sht-sgmhadoopdn-04 bin]$ java -version java version "1.8.0_111" Java(TM) SE Runtime Environment (build 1.8.0_111-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode) [mysqladmin@sht-sgmhadoopdn-04 bin]$ which java /usr/java/jdk1.8.0_111/bin/java [mysqladmin@sht-sgmhadoopdn-04 bin]$ /bin/java -version java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
3.修改elasticsearch配置文件
[elsearch@sht-sgmhadoopdn-04 config]$ cat elasticsearch.yml | grep -v "$^" | grep -v "^#" cluster.name: mycluster node.name: sht-sgmhadoopdn-04 node.attr.rack: r1 path.data: /usr/local/elasticsearch/data path.logs: /usr/local/elasticsearch/logs network.host: 172.16.101.54 http.port: 9200 [elsearch@sht-sgmhadoopdn-04 config]$ mkdir /usr/local/elasticsearch/{data,logs}
4.启动elasticsearch
[elsearch@sht-sgmhadoopdn-04 ~]$ cd /usr/local/elasticsearch/bin [elsearch@sht-sgmhadoopdn-04 bin]$ ./elasticsearch [2018-09-07T10:08:08,689][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] initializing ... [2018-09-07T10:08:08,812][INFO ][o.e.e.NodeEnvironment ] [sht-sgmhadoopdn-04] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [67.6gb], net total_space [76.4gb], spins? [unknown], types [rootfs] [2018-09-07T10:08:08,812][INFO ][o.e.e.NodeEnvironment ] [sht-sgmhadoopdn-04] heap size [1.9gb], compressed ordinary object pointers [true] [2018-09-07T10:08:08,813][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] node name [sht-sgmhadoopdn-04], node ID [DI0-2k8sTlevPq1S5uPb2Q] [2018-09-07T10:08:08,814][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] version[5.5.0], pid[23592], build[260387d/2017-06-30T23:16:05.735Z], OS[Linux/3.10.0-514.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_111/25.111-b14] [2018-09-07T10:08:08,814][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/local/elasticsearch] [2018-09-07T10:08:09,836][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [aggs-matrix-stats] [2018-09-07T10:08:09,836][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [ingest-common] [2018-09-07T10:08:09,836][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [lang-expression] [2018-09-07T10:08:09,836][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [lang-groovy] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [lang-mustache] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [lang-painless] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [parent-join] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [percolator] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [reindex] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [transport-netty3] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] loaded module [transport-netty4] [2018-09-07T10:08:09,837][INFO ][o.e.p.PluginsService ] [sht-sgmhadoopdn-04] no plugins loaded [2018-09-07T10:08:11,723][INFO ][o.e.d.DiscoveryModule ] [sht-sgmhadoopdn-04] using discovery type [zen] [2018-09-07T10:08:12,377][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] initialized [2018-09-07T10:08:12,377][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] starting ... [2018-09-07T10:08:12,520][INFO ][o.e.t.TransportService ] [sht-sgmhadoopdn-04] publish_address {172.16.101.54:9300}, bound_addresses {172.16.101.54:9300} [2018-09-07T10:08:12,531][INFO ][o.e.b.BootstrapChecks ] [sht-sgmhadoopdn-04] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks [2018-09-07T10:08:15,589][INFO ][o.e.c.s.ClusterService ] [sht-sgmhadoopdn-04] new_master {sht-sgmhadoopdn-04}{DI0-2k8sTlevPq1S5uPb2Q}{K1mYnJZ-Q1mnFg3iZCnEbw}{172.16.101.54}{172.16.101.54:9300}{rack=r1}, reason: zen-disco-elected-as-master ([0] nodes joined) [2018-09-07T10:08:15,634][INFO ][o.e.h.n.Netty4HttpServerTransport] [sht-sgmhadoopdn-04] publish_address {172.16.101.54:9200}, bound_addresses {172.16.101.54:9200} [2018-09-07T10:08:15,634][INFO ][o.e.n.Node ] [sht-sgmhadoopdn-04] started [2018-09-07T10:08:15,673][INFO ][o.e.g.GatewayService ] [sht-sgmhadoopdn-04] recovered [0] indices into cluster_state
注:我在CentOS6启动时会报错
[2018-09-10T10:35:24,090][WARN ][o.e.b.JNANatives ] unable to install syscall filter: java.lang.UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:350) ~[elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:245) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:113) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:111) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:351) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.5.3.jar:5.5.3] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.5.3.jar:5.5.3]
https://github.com/elastic/elasticsearch/issues/22899
解决办法:在elasticsearch.yml配置文件添加
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
注:不加任何参数为前台启动,以后台进程启动的话需执行下面的命令
[elsearch@sht-sgmhadoopdn-04 bin]$ ./elasticsearch &
[elsearch@sht-sgmhadoopdn-04 bin]$ ./elasticsearch -d
三 Elasticsearch集群基础操作
The REST API
ES提供了一个非常有用的REST API与集群进行交互,通过REST API可以对集群做如下操作
- 检查ES集群、节点、索引等健康、状态等统计信息
- 管理集群、节点、索引数据以及元数据
- 针对索引可以执行CRUD (Create, Read, Update, and Delete) 以及查询等操作
- 执行高级的搜索,例如分页、排序、过滤、脚本处理、聚合等
1.检查集群健康状态
- GET _cluster/health?
http://172.16.101.54:9200/_cluster/health?pretty
curl -XGET http://172.16.101.54:9200/_cluster/health?pretty
$ curl -XGET http://172.16.101.54:9200/_cluster/health?pretty { "cluster_name" : "pna", "status" : "green", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 0, "active_shards" : 0, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
- GET /_cat/health?v
http://172.16.101.54:9200/_cat/health?v
curl -X GET "172.16.101.54:9200/_cat/health?v"
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1536304482 15:14:42 mycluster green 1 1 0 0 0 0 0 0 - 100.0%
关于status状态栏
- green:表示集群一切正常
- yellow:表示所有数据都是可用状态,但是部分副本未分配,集群目前处于可用状态
- red:表示部分数据由于未知原因不可达,集群处于部分可用状态(例如集群可以继续处理向可用的数据分片请求的回复),但是此时应该立刻进行集群故障检测。
注:9200是http协议的RESTful接口
从上可以看出,集群当前只有一个节点,并且由于我们集群没有任何数据分片数据为0,我们可以通过如下API查看集群中节点信息
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X GET "http://172.16.101.54:9200/_cat/nodes?v" ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 172.16.101.54 7 68 0 0.00 0.01 0.05 mdi * sht-sgmhadoopdn-04
2. indices索引
2.1 查看索引
GET /_cat/indices?v
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -XGET http://172.16.101.54:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
当前集群没有indices
2.2 创建索引
创建一个名为customer的index,pretty表示格式化显示返回的格式为json
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X PUT "http://172.16.101.54:9200/customer?pretty" { "acknowledged" : true, "shards_acknowledged" : true }
再次查看索引
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X GET "http://172.16.101.54:9200/_cat/indices?v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open customer Lnb4kTx9Ql-giLg4tj24Dw 5 1 0 0 810b 810b
通过结果可以看到集群当前有一个叫做“customer”的索引,改索引有5个主切片和1个副本,没有documents。yellow表示有些副本未分配,当前customer索引有副本但是状态显示为yellow的原因是因为我们当前只有一个节点,因为高可用性的原因该索引的副本并不能被分配到其他节点上,直到其他节点加入该集群后,该副本才会被分配到其他节点,并且status状态栏显示为green。
2.3 删除索引
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X DELETE "http://172.16.101.54:9200/customer?pretty" { "acknowledged" : true }
再次查看索引时发现索引已经删除
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X GET "http://172.16.101.54:9200/_cat/indices?v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
3.documents文档
3.1 创建document
根据之前的介绍,document实际上应该存储在type里,type对应mysql数据库里的表,如下我们将一个document插入到一个叫做external的type(表)里,并且指定该document的ID为1,1可以理解为mysql表的primary key
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X PUT "172.16.101.54:9200/customer/external/1?pretty" -H 'Content-Type: application/json' -d'{"name": "John Doe"}' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true }
注:创建document前并不一定需要先创建index,创建document时候如果指定了一个不存在的index则会先新建这个index后再在该index创建document。
3.2 查看document
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X GET "http://172.16.101.54:9200/customer/external/1?pretty" { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "John Doe" } }
查看index
[elsearch@sht-sgmhadoopdn-04 ~]$ curl -X GET "http://172.16.101.54:9200/_cat/indices?v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open customer Lnb4kTx9Ql-giLg4tj24Dw 5 1 1 0 4kb 4kb
3.3 修改document
将"name" : "John Doe"修改为"name" : "Bruce Li"
$ curl -XPOST "http://172.16.101.54:9200/customer/external/1/_update?pretty" -H 'Content-Type: application/json' -d '{ "name": "Bruce Li" }' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 } } $ curl -XGET http://172.16.101.54:9200/customer/external/1?pretty { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 2, "found" : true, "_source" : { "name" : "Bruce Li" } }
如果我们将文档document的id修改为2,再次执行,将会新建一个文档
$ curl -XPOST "http://172.16.101.54:9200/customer/external/2?pretty" -H 'Content-Type: application/json' -d '{"name": "Bruce Li"}' { "_index" : "customer", "_type" : "external", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true } $ curl -XGET http://172.16.101.54:9200/customer/external/2?pretty { "_index" : "customer", "_type" : "external", "_id" : "2", "_version" : 1, "found" : true, "_source" : { "name" : "Bruce Li" } }
再次查看文档数量
[elsearch@sht-sgmhadoopcm-01 ~]$ curl -XGET http://172.16.101.54:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open customer Xo2EIjbuQf2-UZCW3J1RmQ 5 1 2 0 10.3kb 10.3kb
创建document时如果不指定document的ID,ES自动生成ID
$ curl -X POST "http://172.16.101.54:9200/customer/external?pretty" -H 'Content-Type: application/json' -d '{"name": "Jane Doe"}' { "_index" : "customer", "_type" : "external", "_id" : "AWfWhhNL_tXMWA8-Nj1B", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true }
3.4 批量修改documents
_bulk
$ curl -XGET "http://172.16.101.54:9200/customer/_search?pretty" { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "customer", "_type" : "external", "_id" : "AWfWhhNL_tXMWA8-Nj1B", "_score" : 1.0, "_source" : { "name" : "Jane Doe" } }, { "_index" : "customer", "_type" : "external", "_id" : "2", "_score" : 1.0, "_source" : { "name" : "Bruce Li" } }, { "_index" : "customer", "_type" : "external", "_id" : "1", "_score" : 1.0, "_source" : { "name" : "Bruce Li" } } ] } }
将document的id为1和2的"name" : "Bruce Li"修改为"name" : "Bruce Lee"
$ curl -X POST "http://172.16.101.54:9200/customer/external/_bulk?pretty" -H 'Content-Type: application/json' -d ' > {"index":{"_id":"1"}} > {"name": "Bruce Lee" } > {"index":{"_id":"2"}} > {"name": "Bruce Lee" } > ' { "took" : 41, "errors" : false, "items" : [ { "index" : { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 7, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : false, "status" : 200 } }, { "index" : { "_index" : "customer", "_type" : "external", "_id" : "2", "_version" : 4, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : false, "status" : 200 } } ] }
更新时需要注意格式,否则报如下错,个错误一般出现在bulk入库时,是格式不对,每行数据后面都得回车换行,最后一行后要跟空行
{ "error" : { "root_cause" : [ { "type" : "action_request_validation_exception", "reason" : "Validation Failed: 1: no requests added;" } ], "type" : "action_request_validation_exception", "reason" : "Validation Failed: 1: no requests added;" }, "status" : 400 }
将document的id为1"name" : "Bruce Lee"修改为"name" : "Jack Chen"并删除document id 2
$ curl -X POST "http://172.16.101.54:9200/customer/external/_bulk?pretty" -H 'Content-Type: application/json' -d' > {"update":{"_id":"1"}} > {"doc": { "name": "Jack Chen" } } > {"delete":{"_id":"2"}} > ' { "took" : 58, "errors" : false, "items" : [ { "update" : { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 8, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 200 } }, { "delete" : { "found" : true, "_index" : "customer", "_type" : "external", "_id" : "2", "_version" : 5, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 200 } } ] }