ES版本
7.10.0
1、下载ES和IK分词安装包
ES : https://www.elastic.co/cn/downloads/past-releases#elasticsearch
IK : https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.10.0
2、安装ES
(1)解压ES
[root@hecs-36968 soft]# tar -zxvf elasticsearch-7.10.0-linux-x86_64.tar.gz
(2)配置ES
修改jvm.options和elsticsearch.yml两个文件
jvm.options将jvm堆大小调至512m
elsticsearch.yml
cluster.name: my-application #集群名称
node.name: node-1 #节点名称
#数据和日志的存储目录
path.data: /www/server/es/data
path.logs: /www/server/es/logs
#设置绑定的ip,设置为0.0.0.0以后就可以让任何计算机节点访问到了
network.host: 0.0.0.0
http.port: 9200 #端口
#设置在集群中的所有节点名称,这个节点名称就是之前所修改的,当然你也可以采用默认的也行,目前是单机,放入一个节点即可
cluster.initial_master_nodes: ["node-1"]
3、启动ES
[root@hecs-36968 bin]# sh elasticsearch -d
报错1:
./elasticsearch-env: line 126: syntax error near unexpected token `<’bash 3.0后,shell中加入了新的符号"<<<" ,将报错的位置的双< < 修改 <<< ,再次运行即可启动成功, 例如:done <<< `env
vim命令:
:set nu表示显示行号
报错2:
java.lang.RuntimeException: can not run elasticsearch as root
不能用root账号启动ES,创建ES用户,例如:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pkkkuo9W-1659404208427)(C:\Users\Mr.xu\AppData\Roaming\Typora\typora-user-images\image-20220416135932034.png)]
切换ES用户:
[root@hecs-36968 bin]# su es
4、测试ES是否安装成功
[es@hecs-36968 bin]$ curl 127.0.0.1:9200
{
"name" : "hecs-36968",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "_na_",
"version" : {
"number" : "7.10.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
"build_date" : "2020-11-09T21:30:33.964949Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
5、关闭ES
[es@hecs-36968 bin]$ ps -ef | grep elastic
es 626267 1 4 14:00 pts/0 00:00:30 /opt/soft/jdk1.8.0_321/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,JRE -Xms512m -Xmx512m -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-1310430730704506222 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:logs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=64m -XX:MaxDirectMemorySize=536870912 -Des.path.home=/opt/soft/elasticsearch-7.10.0 -Des.path.conf=/opt/soft/elasticsearch-7.10.0/config -Des.distribution.flavor=default -Des.distribution.type=tar -Des.bundled_jdk=true -cp /opt/soft/elasticsearch-7.10.0/lib/* org.elasticsearch.bootstrap.Elasticsearch -d
es 626372 626267 0 14:00 pts/0 00:00:00 /opt/soft/elasticsearch-7.10.0/modules/x-pack-ml/platform/linux-x86_64/bin/controller
es 652947 623366 0 14:12 pts/0 00:00:00 grep --color=auto elastic
[es@hecs-36968 bin]$ kill -9 626267
6、安装IK分词器
[root@hecs-36968 plugins]# pwd
/opt/soft/elasticsearch-7.10.0/plugins
[root@hecs-36968 plugins]# mkdir ik
[root@hecs-36968 plugins]# chmod 755 ik/
[root@hecs-36968 plugins]# cd ik
[root@hecs-36968 plugins]# unzip ../elasticsearch-analysis-ik-7.10.0.zip
7、vm参数
修改sysctl.cong,添加vm参数
[root@hecs-36968 ~]# vim /etc/sysctl.conf
vm.max_map_count=262144
jdk相关配置:
[root@hecs-36968 ~]# vim /etc/profile
export JAVA_HOME=/opt/soft/jdk1.8.0_321
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${PATH}
[root@hecs-36968 ~]# source /etc/profile
8、分片策略
合理设置分片数:
es分片不可修改,所以创建索引要合理设置好分片
分片在同一个节点上相互竞争资源,分片太多匹配度太差
分片容量一般不超过32G,分片数量 = 数据总量/32
节点数 <=主分片数*(副本+1)
分片数量 <= 3 * node
- 资源充足:3 分片,2 副本计算节点数?最优9 台机器,最低 1 台机器
最大node = 3 * (2+1) = 9
最小node = 3 / 3 = 1
- 资源有限:2 台机器?最多分片 6 , 副本1~2
推迟分片分配时间:等待已挂节点重新加入,减少数据打乱问题
settings:{
“index.unassigned.node_left.delayed_timeout”:“5m”
}
9、路由选择优化
路由默认用 hash(—id)%主分片数
查询优化:
-
不带路由参数:查询效率慢
-
带路由参数:直接路由到该分片,查询效率快
10、写入速度优化
场景:查询要求性能不高,写入要求性能高
- 加大Translog Flush
- 增加 Index Refresh 间隔
- 调整Bulk 线程池和队列
- 优化节点任务分布
- 优化Lucene层索引建立
- 先停止副本,写完开启副本恢复数据。
11、内存设置优化 -Xmx -Xms
- <= 物理内存的50%
- <= 32G
ik分词器分完之后的词称之为词条
倒排索引:根据词条来确定记录的位置
倒排索引包含:词典和倒排文件
词典存于内存,倒排文件存于磁盘
12、索引操作:
(1)创建索引:(采用PUT方式,POST方式无幂等性,提示不允许操作)
PUT idx_cc
{
"settings": {
"number_of_shards": "3",
"number_of_replicas": "2"
},
"mappings": {
"properties": {
"age": {
"type": "keyword"
},
"birth": {
"type": "date",
"format": "[yyy-MM-dd HH:mm:ss]"
},
"name": {
"type": "text"
}
}
}
}
(2)查询索引信息:
GET idx_cc
(3)查询所有索引信息
GET _cat/indices?v
(4)删除索引
DELETE idx_cc
(5)修改分片信息(主分片不能修改,只能修改副本):
PUT idx_cc/_settings
{
"number_of_replicas": 2
}
13、文档操作
(1)添加文档(增量修改):
POST idx_cc/_doc/5
{
"name": "cc",
"age": 33,
"birth": "2022-12-12 12:12:12"
}
(2)批量添加文档:
PUT idx_cc/_bulk
{"create": {"_index": "idx_cc", "_type": "_doc", "_id": 4}}
{"name":"刘一刀","age":12,"birth":"2022-12-12 12:12:12"}
(3)修改文档:
POST idx_cc/_update/5
{
"doc": {
"name": "张亮亮"
}
}
(4)删除文档:
DELETE idx_cc/_doc/222
(5)查询索引下所有文档:
GET idx_cc/_search
(6)查询文档:
GET idx_cc/_doc/5
14、查询操作
(1)math匹配查询:
GET idx_cc/_search
{
"query":{
"match":{
"name":"张"
}
}
}
(2)match_phrase(短语term匹配):
GET idx_cc/_search
{
"query":{
"match_phrase": {
"name":{
"query": "because is",
"slop": 1
}
}
}
}
slop表示两个词间隔 n 个位置,仍然匹配。
(3)分页查询:
GET idx_cc/_search
{
"query":{
"match":{
"name":"张"
}
},
"size": 1,
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
(4)指定返回字段查询:
GET idx_cc/_search
{
"query":{
"match":{
"name":"张"
}
},
"_source": ["age"]
}
(5)must查询(and效果)
must、should禁止同级连用,must可以嵌套should
should可以嵌套must
GET idx_cc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"age": 12
}
},
{
"bool": {
"should": [
{
"match": {
"name": "赵"
}
},
{
"match": {
"name": "三"
}
}
]
}
}
]
}
}
}
(6)should查询(or效果)
GET idx_cc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "三"
}
},
{"match": {
"name": "文"
}}
]
}
}
}
(7)范围操作:
GET idx_cc/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"filter": [
{
"range": {
"age": {
"gt": 12
}
}
}
]
}
},
{
"bool": {
"should": [
{
"match": {
"name": "大"
}
},
{
"match": {
"name": "cc"
}
}
]
}
}
]
}
}
}
(8)match_phrase词语匹配:
GET idx_cc/_search
{
"query": {
"match_phrase": {
"name": "刘一刀"
}
}
}
(9)term精确匹配:
GET idx_cc/_search
{
"query": {
"term": {
"name": "刘一刀"
}
}
}
(10)高亮显示:
GET idx_cc/_search
{
"query": {
"match_phrase": {
"name": "刘一刀"
}
},
"highlight": {
"fields": {
"name": {
"pre_tags": "<em style='color:red'>",
"post_tags": "</em>"
}
}
}
}
(11)聚合操作(分组、统计平均值、总数…):
GET idx_cc/_search
{
"aggs": {
"avgs": {
"terms": {
"field": "age"
}
}
},
"size": 0
}
(12)scroll分页(大数据量统计):
GET idx_cc/_search?scroll=8m
{
"size": 1
}
GET _search/scroll
{
"scroll": "8m",
"scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoAxYybk12R2FoMlJnYUhmeUg3amtEanVBAAAAAAAAD6gWRXdXNmFJMHVUMmFGc1ZIRE00YVYwQRYybk12R2FoMlJnYUhmeUg3amtEanVBAAAAAAAAD6cWRXdXNmFJMHVUMmFGc1ZIRE00YVYwQRZjWmdsTjVpVFI0bW9DT0hPeWR0ZU5nAAAAAAAABs4WRldUdXd5TE5RaXlEcTZLaWd6cTBNdw=="
}
(13)search After分页(1w记录分页以上推荐):
GET idx_cc/_search
{
"size":1,
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
GET idx_cc/_search
{
"size":1,
"sort": [
{
"age": {
"order": "desc"
}
}
],
"search_after": [
"33"
]
}
15、搭建ES集群
主节点:
#集群名称
cluster.name: cluster-es
#节点名称
node.name: node-1
# 主节点设置
cluster.initial_master_nodes: ["node-1"]
#是否为竞争节点
node.master: true
#是否为数据节点
node.date: true
#数据存储路径
path.data: /opt/es-cluster/node1/data
#日志存储路径
path.logs: /opt/es-cluster/node1/logs
#绑定的ip地址(“0.0.0.0”说明都可以访问)
network.host: 0.0.0.0
#对外服务的http端口
http.port: 9200
#节点间交互的tcp端口,默认是9300
transport.tcp.port: 9301
#节点列表
discovery.zen.ping.unicast.hosts: ["192.168.0.109:9301", "192.168.0.109:9302","192.168.0.109:9303"]
#至少两个节点在线
discovery.zen.minimum_master_nodes: 2
#接入es-head插件
http.cors.enabled: true
http.cors.allow-origin: "*"
从节点:
#集群名称
cluster.name: cluster-es
#节点名称
node.name: node-2
# 启动默认主节点设置
cluster.initial_master_nodes: ["node-1"]
#是否为竞争节点
node.master: true
#是否为数据节点
node.date: true
#数据存储路径
path.data: /opt/es-cluster/node2/data
#日志存储路径
path.logs: /opt/es-cluster/node2/logs
#绑定的ip地址(“0.0.0.0”说明都可以访问)
network.host: 0.0.0.0
#对外服务的http端口
http.port: 9200
#节点间交互的tcp端口,默认是9300
transport.tcp.port: 9302
#节点列表
discovery.zen.ping.unicast.hosts: ["192.168.0.109:9301", "192.168.0.109:9302","192.168.0.109:9303"]
#至少两个节点在线
discovery.zen.minimum_master_nodes: 2
#接入es-head插件
http.cors.enabled: true
http.cors.allow-origin: "*"
从节点
#集群名称
cluster.name: cluster-es
#节点名称
node.name: node-3
# 启动默认主节点设置
cluster.initial_master_nodes: ["node-1"]
#是否为竞争节点
node.master: true
#是否为数据节点
node.date: true
#数据存储路径
path.data: /opt/es-cluster/node3/data
#日志存储路径
path.logs: /opt/es-cluster/node3/logs
#绑定的ip地址(“0.0.0.0”说明都可以访问)
network.host: 0.0.0.0
#对外服务的http端口
http.port: 9200
#节点间交互的tcp端口,默认是9300
transport.tcp.port: 9303
#节点列表
discovery.zen.ping.unicast.hosts: ["192.168.0.109:9301", "192.168.0.109:9302","192.168.0.109:9303"]
#至少两个节点在线
discovery.zen.minimum_master_nodes: 2
#接入es-head插件
http.cors.enabled: true
http.cors.allow-origin: "*"
#接入es-head插件
http.cors.enabled: true
http.cors.allow-origin: “*”
**从节点**
```yaml
#集群名称
cluster.name: cluster-es
#节点名称
node.name: node-3
# 启动默认主节点设置
cluster.initial_master_nodes: ["node-1"]
#是否为竞争节点
node.master: true
#是否为数据节点
node.date: true
#数据存储路径
path.data: /opt/es-cluster/node3/data
#日志存储路径
path.logs: /opt/es-cluster/node3/logs
#绑定的ip地址(“0.0.0.0”说明都可以访问)
network.host: 0.0.0.0
#对外服务的http端口
http.port: 9200
#节点间交互的tcp端口,默认是9300
transport.tcp.port: 9303
#节点列表
discovery.zen.ping.unicast.hosts: ["192.168.0.109:9301", "192.168.0.109:9302","192.168.0.109:9303"]
#至少两个节点在线
discovery.zen.minimum_master_nodes: 2
#接入es-head插件
http.cors.enabled: true
http.cors.allow-origin: "*"