Elasticsearch 讲解

最新推荐文章于 2024-06-23 06:58:19 发布

Jerry00713

最新推荐文章于 2024-06-23 06:58:19 发布

阅读量923

点赞数

分类专栏： k8s 文章标签： elasticsearch

本文链接：https://blog.csdn.net/Jerry00713/article/details/113867977

版权

k8s 专栏收录该内容

44 篇文章 25 订阅

订阅专栏

Elasticsearch介绍：

1、Elasticsearch的功能

(1）分布式搜索和分析引擎
(2）全文检索，结构化检索，数据分析
(3）对海量数据进行近实时的处理
(4）日志收集展示分析（对于运维主要使用）
分布式：ES自动可以将海量数据分散到多台服务器上存储和检索
海量数据的处理：分布式以后，就可以采用大量的服务器去存储和检索数据，自然而然就可以实现海量数据的处理了。跟分布式/海量数据相反的，lucene,单机应用，只能在单台服务器上使用，最多只能处理单台服务器可以处理的数据量。
近实时：在秒级别对数据进行搜索和分析

2、 Elasticsearcg的适用场景
(1）维基百科，全文检索，关键词高亮，搜索推荐 1.3 Elasticsearcg的适用场景
(2）新闻网站，用户日志+社交网络数据，分析
(3) Stack Ovelow （国外的程序异常讨论论坛)，全文检索，搜索相关问题和答案
(4) GitHub(开源代码管理)，搜索上亿行代码
(5) 电商网站，检索商品
(6) 日志数据分析，logstash,采集日志，ES进行复杂的数据分析
(7）商品价格监控网站，用户设定某商品的价格阈值，当低于该阈值的时候，发视消息给用户( 8)BI系统，ES执行数据分析和挖掘，Kibana进行数据可视化
(9）国内，站内搜索(电商，招聘，门户)
( 10）BI系统，商业智能，分析用户消费趋势和用户群体的组成构成。

3、Elasticsearch,的特点
(1) 可以作为一个大型分布式集群（数百台服务器）技术，处理PB 级数据，服务大公司，也可以运行在单机上，服务小公司
(2) Elasticsearch.,不是什么新技术，主要是将全文检索，数据分析以及分布式技术，合并在一起，才形成了独一无二的 ES，lucene(全文检索)
(3）对用户而言，是开箱即用的，非常简单，作为中小型的应用，直接3分钟部署一下 ES，就可以作为生产环境的系统来使用了，数据量不大，操作不是太复杂
(4）数据库的功能面对很多领域是不够的，优势:事务，各种联机事务型的操作，特殊的功能，比如全文检索，同义词处理，相关度排名，复杂数据分析，海量数据的近实时处理，Elasticsearch,作为传统数据库的一个补充，提供了数据库所不能提供的很多功能。

4、 Elasticsearcg的核心概念
(1) Near Realtime(NRT)：近实时，两个意思，从写入数据到可以被搜索有一个小延迟(大概1秒);
(2) Cluster：集群，包含多个节点，每个节点属于哪个集群是通过一个配置（集群名称:默认是elasticsearch）来决定的，对于中小型应用来说，刚开始一个集群就一个节点很正常
(3) Node：节点，集群中的一个节点，节点也有一个名称（默认是随机分配的)，节点名称很重要(在执行运维管理操作的时候），默认节点会去加入一个名称为‘elasticsearch.'..的集群，如果直接启动一堆节点，那么他们会自动组成一个elasticsearch.集群，当然一个节点也可以组成一个elastic.saerch集群
(4) Document：文档，ES中最小数据单元，一个document可以是一条客户数据，一条商品分类数据，一条订单数据，通常是JSON数据结构表示，每个index 下的 type中，都可以去存储多个document,一个 document里面有多个feld,每个field就是一个数据字段。
(5)Index：索引包含一堆有相似结构的文档数据，比如可以有一个客户索引，商品分类索引，订单索引，索引有一个名称，一个index包含很多document。一个index就代表了一类类似的或者相同的document。比如说建立一个product index,商品索引，里面可能存放了所有的商品数据，所有的商品document.
(6) Type：类型，每个索引里都可以有一个或多个type，type是index 中的一个逻辑数据分类，一个type下的document,都有相同的 field，比如博客系统，有一个索引，可以定义用户数据type,博客数据type,评论数据type。
商品index，里面存放了所有的商品数据，商品document。但是商品分很多种类，每个种类的document 的 field可能不太一样，比如说电器商品，可能还包含一些诸如售后时间范围这样的特殊feld:生鲜商品，还包含一些诸如生鲜保质期之类的field
(7) shard：单台机器无法存储大量数据，es可以将一个索引中的数据切分为多个shar d，分布在多台服务器上存储，有了shard就可以横向扩展，存储更多数据，让搜索和分析等操作分不到多台服务器上去执行，提升吞吐量和性能，每个shar d都是一个lucene index
(8) replica：任何一个服务器随时可能故障或宕机，此时 shard可能就会丢失，因此可以为每个shar d创建多个replica副本，replica可以在 shard 故障时提供备用服务，保证数据不丢失，多个replica还可以提升搜索操作的吞吐量和性能。primary shar d(建立索引时一次设置,不能修改，默认5个), replicashard(随时修改数量，默认1个)﹐默认每个索引10个shard，5个primary shar d,5个replica shard,最小的高可用配置，是2台服务器

5、Elasticsearch 核心概念 VS 数据库核心概念
Elasticsearch mysql 数据库
=================================================
Document（文档）行 mysql 的一行，在ES中叫文档
Type（类型）表 mysql 的一个表，在ES中叫类型
Index（索引）库 mysql 的一个库，在ES中叫索引
filed 字段 mysql 的一个字段，在ES中叫类型

6、安装方式

安装方式

优点

缺点

dcocker

1.部署方便

2.开箱即用

3.启动迅速

1.需要有docker的知识

2.修改配置麻烦,需要重新生成镜像

3.数据存储需要挂载目录

tar

1.部署灵活

⒉.对系统的侵占性小

1.需要自己写启动管理文件

2.目录提前需要规划好

Rpm | deb (Ubuntu的rpm)

1.部署方便

2.启动脚本安装即用

3.存放目录标准化

1.软件各个组件分散在不同的目录

2.卸裁可能不干净

3.默认配置需要修改

ansible

1.极其的灵活

2.你想要的功能都有

3.批量部署速度快

1.需要学习ansible语法和规则

2.需要提前规划好所有的标准

3.需要专人维护

一、单节点使用ES

准备一台机器：192.168.75.125

1、rpm安装ES

[root@75-125 ~]#  yum install -y java-1.8.0-openjdk.x86_64
[root@75-125 ~]#  mkdir -p /opt/installpag;cd /opt/installpag
[root@75-125 installpag ]#  wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.0.rpm
[root@75-125 installpag]# ll
-rw-r--r--. 1 root root 114059630 1月  29 2019 elasticsearch-6.6.0.rpm

[root@75-125 installpag]# rpm -ivh elasticsearch-6.6.0.rpm 

安装后提示如下
### NOT starting on installation, please execute the following statements to configure elasticsearch service to start automatically using systemd
 sudo systemctl daemon-reload
 sudo systemctl enable elasticsearch.service
### You can start elasticsearch service by executing
 sudo systemctl start elasticsearch.service
Created elasticsearch keystore in /etc/elasticsearch
翻译：
###安装时未启动，请执行以下语句以配置elasticsearch服务以使用systemd自动启动
sudo systemctl守护程序重新加载
sudo systemctl启用弹性搜索服务
###您可以通过执行
sudo systemctl启动弹性搜索服务
在/etc/elasticsearch中创建了elasticsearch密钥库

1.2、启动查看状态


[root@75-125 installpag]# systemctl daemon-reload
[root@75-125 installpag]# systemctl enable elasticsearch.service
Created symlink from /etc/systemd/system/multi-user.target.wants/elasticsearch.service to /usr/lib/systemd/system/elasticsearch.service.

[root@75-125 installpag]# systemctl start elasticsearch.service

验证启动成功--方式一：
[root@75-125 installpag]# netstat -tulpn |grep 9200   注意：elasticsearch.启动比较慢，查看端口9200未启动，需要查看日志
[root@75-125 installpag]# tail -f /var/log/elasticsearch/elasticsearch.log    日志中没有报错
[2021-02-20T10:32:18,475][INFO ][o.e.g.GatewayService     ] [rSX_fym] recovered [0] indices into cluster_state
[2021-02-20T10:32:18,975][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.triggered_watches] for index patterns [.triggered_watches*]
[2021-02-20T10:32:19,135][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.watch-history-9] for index patterns [.watcher-history-9*]
[2021-02-20T10:32:19,194][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.watches] for index patterns [.watches*]
[2021-02-20T10:32:19,288][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
[2021-02-20T10:32:19,359][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
[2021-02-20T10:32:19,429][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
[2021-02-20T10:32:19,477][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
[2021-02-20T10:32:19,524][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rSX_fym] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
[2021-02-20T10:32:19,741][INFO ][o.e.l.LicenseService     ] [rSX_fym] license [53c1c120-a2d9-4505-a081-a48f60c48fe3] mode [basic] - valid

[root@75-125 installpag]# netstat -tulpn |grep 9200   查看端口9200启动
tcp6       0      0 127.0.0.1:9200          :::*                    LISTEN      1569/java           
tcp6       0      0 ::1:9200                :::*                    LISTEN      1569/java  

验证启动成功--方式二：
[root@75-125 installpag]# curl 127.0.0.1:9200
{
  "name" : "rSX_fym",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "cxsIpPbWRPukZSTBpc1Wvg",
  "version" : {
    "number" : "6.6.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "a9861f4",
    "build_date" : "2019-01-24T11:27:09.439740Z",
    "build_snapshot" : false,
    "lucene_version" : "7.6.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

2、配置ES文件

2.1、修改配置文件

查看elasticsearch的配置文件。c是文件的意思
[root@75-125 installpag]# rpm -qc elasticsearch
/etc/elasticsearch/elasticsearch.yml    ES主配置文件
/etc/elasticsearch/jvm.options          jvm虚拟机配置，java之所以能在所有的平台运行代码，是因为把代码放在虚拟机中，java跨平台运行
/etc/init.d/elasticsearch               启动文件
/etc/sysconfig/elasticsearch            ES环境变量相关的参数配置文件          
/usr/lib/sysctl.d/elasticsearch.conf    最大文件限制
/usr/lib/systemd/system/elasticsearch.service   systemd启动文件

[root@75-125 installpag]# rpm -ql elasticsearch |grep /var/
/var/lib/elasticsearch    数据目录
/var/log/elasticsearch    日志目录
/var/run/elasticsearch    pid目录


[root@75-125 installpag]# cat /usr/lib/sysctl.d/elasticsearch.conf
vm.max_map_count=262144
java相关的，最大文件限制一个进程可以拥有的VMA(虚拟内存区域)的数量

生成一个elasticsearch账户
[root@75-125 elasticsearch]# tail -1 /etc/passwd
elasticsearch:x:998:996:elasticsearch user:/nonexistent:/sbin/nologin

2.2、修改ES主配置文件

[root@75-125 installpag]# vi /etc/elasticsearch/elasticsearch.yml
：set nu 
     17 #cluster.name: my-application    集群相关的
     23 node.name: node-1  设置节点名称，可以开可以不开，不打开默认node-1
     33 path.data: /data/elasticsearch    修改
     37 path.logs: /var/log/elasticsearch   不改
     43 bootstrap.memory_lock: true     打开内存锁定。对于java来说如果不进行内存锁定，会一直吃内存，这个配合/etc/elasticsearch/jvm.options。启动后直接锁死内存，直接占用多大内存，如果超过立马杀死
     55 network.host: 192.168.0.1   绑定IP，如果不开启，默认是127.0.0.1。导致其他设备不能访问
     59 http.port: 9200   端口可以开可以不开 ， 不打开默认9200 

[root@75-125 installpag]# cat /etc/elasticsearch/elasticsearch.yml |egrep -v ^#
node.name: node-1
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 192.168.75.125,127.0.0.1
http.port: 9200

[root@75-125 elasticsearch]# mkdir -p /data/elasticsearch
[root@75-125 elasticsearch]# chown -R elasticsearch:elasticsearch /data/elasticsearch/

2.3、配置jvm虚拟机

[root@75-125 ~]# vi /etc/elasticsearch/jvm.options    根据你的系统修改对应的Xms512m、Xmx512m
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms512m
-Xmx512m

[root@75-125 ~]# systemctl restart elasticsearch
[root@75-125 ~]# systemctl status elasticsearch  查看后启动失败
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since 六 2021-02-20 11:43:30 CST; 3min 29s ago

解决方案：
[root@75-125 ~]# systemctl edit elasticsearch    填写下面内容后：wq 保存
[Service]
LimitMEMLOCK=infinity

[root@75-125 ~]# systemctl daemon-reload
[root@75-125 ~]# systemctl restart elasticsearch
[root@75-125 ~]# systemctl status elasticsearch  查看正常


理解：systemctl edit elasticsearch，这个命令就是给你生成一个/etc/systemd/system/elasticsearch.service.d/override.conf 文件，里面是
[Service]
LimitMEMLOCK=infinity

解释：

[root@75-125 ~]# vi /etc/elasticsearch/jvm.options
# Xms represents the initial size of total heap space    最小占用内存
# Xmx represents the maximum size of total heap space    最大占用内存

-Xms1g
-Xmx1g

注意：
1、-Xms1g=-Xmx1g   说明启动后固定就占用1g。
2、为什么：假如设备一共2g，其他服务已经占用1.3g，如果最小-Xms是512m，最大是1g，这个时候刚启动是没问题，但是如果java占用700m，700m+1.3g=2g，然后java还在涨直到1g后，出现了内存不够，导致其他服务也异常
3、Xmx不能超过32g，如果超了性能不升还下降
4、最大最小内存最好设置一样
5、最大给服务器物理内存的50%

检查状态：

[root@75-125 ~]# netstat -tulpn |grep 9200
tcp6       0      0 192.168.75.125:9200     :::*                    LISTEN      1603/java           
[root@75-125 ~]# curl 192.168.75.125:9200
{
  "name" : "node-1",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "QptcbgQhSJWB8y8_yob84g",
  "version" : {
    "number" : "6.6.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "a9861f4",
    "build_date" : "2019-01-24T11:27:09.439740Z",
    "build_snapshot" : false,
    "lucene_version" : "7.6.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
[root@75-125 ~]#

3、ES-head插件安装

3.1、Elasticsearch交互-交互方式概念

RESTful API是一种类似于协议接口，通过这种接口传递参数。所有其他语言可以使用RESTfulAPl通过端口9200和Elasticsearch进行通信，可以用web客户端访问Elasticsearch，或者甚至可以使用curl命令和Elasticsearch交互。
—个Elasticsearch请求和任何HTTP请求一样由若干相同的部件组成：curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>'-d '<BODY>
VERB:适当的HTTP方法或谓词：GET'、“POST'、“PUT'、HEAD或者`DELETE。PROTOCOL http或者https`(如果你在Elasticsearch前面有一个`https代理)
HOST: Elasticsearch集群中任意节点的主机名，或者用localhost代表本地机器上的节点。
PORT:运行Elasticsearch HTTP服务的端口号，默认是9200。
PATH: API的终端路径(例如_count将返回集群中文档数量)。Path可能包含多个组件，例如:_cluster/stats和_nodes/stats/jvm.
QUERY_STRING:任意可选的查询字符串参数(例如 ?pretty将格式化地输出JSON 返回值，使其更容易阅读)BODY:一个JSON格式的请求体(如果请求需要的话)

3.2、Elasticsearch交互方式种类

1、curl命令：最繁琐、最复杂、最容易出错、不需要安装任何软件，只需要有curl命令

[root@75-125 ~]# curl 192.168.75.125:9200/_cat/nodes
192.168.75.125 23 93 1 0.00 0.01 0.05 mdi * node-1

2、es-head插件：查看数据方便、操作相对容易、需要node环境

3、kibana：(ELK中的k)查看数据以及报表格式丰富、操作很简单、需要java环境和安装配置kibana

3.3、安装ES-head

ES-head 插件在5.0以前是官方的插件，5.0以后安装方式改变，需要nodejs环境支持，或者直接使用别人封装好的docker镜像

这个时候填写对应的信息后，连接就能显示

4、使用ES API---curl命令

4.1、创建Index索引（ mysql 库）

在http中会使用get、 put、 post、等。所以xput 提交的意思。vipinfo 数据库的名字叫vipinfo。所以总体的意思是：当于mysql创建一个数据库，数据库的名字叫vipinfo

[root@75-125 installpag]# curl -XPUT 192.168.75.125:9200/vipinfo
{"acknowledged":true,"shards_acknowledged":true,"index":"vipinfo"}[root@75-125 installpag]# 

注意：执行后返回的是json的形式，可以在json中显示

创建数据库test，不以json显示，?pretty以友好的人类可读性是显示

[root@75-125 installpag]# curl -XPUT 192.168.75.125:9200/test?pretty
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test"
}

查看es-head:

ES默认配置是：5分片1副本。
1、其中node-1后面的是数据分片。什么是数据分片：一个数据分成了（0~4）5个，打散分别存放在不同地方
2、Unassigned后面的是副本分片。什么是副本分片：ES诞生就是为了集群做备份。而备份不能在自己身上，目前没有其他机器。所以不满足默认配置5分片1副本。所以yellow：默认配置不满足，但是数据还在，只不过数据没有备份

存放数据：/data/elasticsearch/nodes/0/indices/中有4XhyGNeqQMOYiaR2TTrfew 中有0 1 2 3 4 _state

4.2、curl命令行形式添加数据

目前没有任何数据

user：新建一个表，使用这个表
1？：相当于mysql中索引。插入第一行数据1？，在插入第二行2？。数字不可以省
-d :后面接的是要插入的信息

[root@75-125 installpag]# curl -XPUT '192.168.75.125:9200/vipinfo/user/1?pretty' -H 'Content-Type: application/json' -d'
{    
      "first_name" : "John",
      "last_name": "Smith",
      "age" : 25,  
      "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]
}
'

[root@75-125 installpag]# curl -XPUT 'localhost:9200/vipinfo/user/2?pretty' -H 'Content-Type: application/json' -d'
{    
      "first_name" : "tece",
      "last_name": "Fir",
      "age" : 35,
      "about" : "I love to go rock sports", "interests": [ "forestry" ]
}
'

注：即使没有数据库也可以插入数据，
[root@75-125 installpag]# curl -XPUT 'localhost:9200/linux/teacher/?pretty' -H 'Content-Type: application/json' -d'
{    
      "teacher" : "old456",
      "num" : 21,
      "age" : 35
}
'


结果
{
  "_index" : "vipinfo",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

age 输入25,就会显示25

随机生成id：使用 curl -XPOST

[root@75-125 ~]# curl -XPOST 'localhost:9200/linux/teacher/?pretty' -H 'Content-Type: application/json' -d'
{    
      "teacher" : "oldzhang",
      "num" : 35,
      "age" : 35
}
'

{
  "_index" : "linux",
  "_type" : "teacher",
  "_id" : "Q8ffvncBwkahPw2N-6tM",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
[root@75-125 ~]#

随机的id、不随机id：

如果不使用随机ID：需要记录每次的id号，而且插入的时候每次都的判断到底有没有这个id，性能损耗
如果使用随机ID：不需要记录每次的id号，但是需要查询数据改怎么查，除了id不同，其他的数据（比如年龄1w个19怎么查看）
解决方案：增加一个sid，这样解决随机id，每次插入的时候sid都改变，就算是相同了，只有几个相同的。

[root@75-125 ~]# curl -XPOST 'localhost:9200/usertest/utest/?pretty' -H 'Content-Type: application/json' -d'
{    
      "sid" : 10,
      "teacher" : "oldzhang",
      "num" : 35,
      "age" : 35
}
'

[root@75-125 ~]#  curl -XPOST 'localhost:9200/usertest/utest/?pretty' -H 'Content-Type: application/json' -d'
{    
      "sid" : 10,
      "teacher" : "oldssszhang",
      "num" : 25,
      "age" : 35
}
'

{
  "_index" : "usertest",
  "_type" : "utest",
  "_id" : "RMf0vncBwkahPw2NxKtL",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

4.3、查询数据

查询vipinfo下的所有表

[root@75-125 ~]# curl -XGET localhost:9200/vipinfo/_search?pretty
{
  "took" : 63,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "vipinfo",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "tece",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I love to go rock sports",
          "interests" : [
            "forestry"
          ]
        }
      },
      {
        "_index" : "vipinfo",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

查询vipinfo下的user中id=1

[root@75-125 ~]# curl -XGET localhost:9200/vipinfo/user/1?pretty
{
  "_index" : "vipinfo",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [
      "sports",
      "music"
    ]
  }
}

按条件查询，不按照id。其中_serarch?q 就等同于where

[root@75-125 ~]# curl -XGET 'localhost:9200/vipinfo/user/_search?q=last_name:Smith&pretty'
{
  "took" : 401,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "vipinfo",
        "_type" : "user",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

另一种写法，不能过滤多种

[root@75-125 ~]# curl -XGET 'localhost:9200/vipinfo/user/_search?pretty' -H 'Content-Type: application/json' -d'
{
   "query" : {
       "match" : {
              "last_name" : "Smith" 
        }
    }
}
'

过滤多种

[root@75-125 ~]# curl -XGET 'localhost:9200/vipinfo/user/_search?pretty' -H 'Content-Type: application/json' -d'
{
   "query" : {
       "match" : {
              "last_name" : "Smith" 
        }
    },
    "filter" : {
        "range" : {"age" : { "gt" : 30 }
     }        
   }
}
'

工具查询数据。+是多个条件

4.4、更新数据

如果使用PUT模式，必须要填写完整的信息，注意：只能修改其中一条
[root@75-125 ~]# curl -XPUT 'localhost:9200/vipinfo/user/1?pretty' -H 'Content-Type: application/json' -d'
{
      "first_name" : "John",
      "last_name": "Smith",
      "age" : 29,  
      "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]
}
'

如果使用POST模式，只需要填写修改的信息注意，随便改，注意：没写的那条数据就会比变成空
[root@75-125 ~]# curl -XPOST 'localhost:9200/vipinfo/user/1?pretty' -H 'Content-Type: application/json' -d'
{
      "first_name" : "Johna",
      "age" : 22
}
'

使用工具

4.5、删除数据

[root@75-125 ~]# curl -XDELETE 'localhost:9200/vipinfo/user/1?pretty' -H 'Content-Type: application/json' -d' {}'
{
  "_index" : "vipinfo",
  "_type" : "user",
  "_id" : "1",
  "_version" : 8,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 7,
  "_primary_term" : 3
}

或者在工具删除，动作→删除→输入删除→确定

4.6、关闭数据

二、ES集群

1、介绍

Elasticsearch 可以横向扩展至数百(甚至数千)的服务器节点，同时可以处理PB级数据
Elasticsearch 天生就是分布式的，并且在设计时屏蔽了分布式的复杂性。
Elasticsearch 尽可能地屏蔽了分布式系统的复杂性。

2、安装部署介绍

准备三台机器：10.4.7.2、10.4.7.3、10.4.7.4

部署跟跟单节点一样，区别是修改单节点的/etc/elasticsearch/elasticsearch.yml

#cluster.name: my-application 集群名称：同一个集群内的所有成员节点的集群名称要相同
#node.name: node-1 节点名称：同一个集群内的所有成员节点的节点名称要不相同
#network.host: 10.4.7.2,127.0.0.1 自己的网络
#discovery.zen.ping.unicast.hosts: ["host1", "host2"] 把集群中的所有节点都写入
#discovery.zen.minimum_master_nodes: master候选节点候选节点

2.1、先部署10.4.7.2：

[root@7-2 ]# yum install -y java-1.8.0-openjdk.x86_64
[root@7-2 ]# mkdir -p /opt/installpag;cd /opt/installpag
[root@7-2 installpag ]# rz elasticsearch-6.6.0.rpm 
[root@7-2 installpag ]# rpm -ivh elasticsearch-6.6.0.rpm 
[root@7-2 ]# systemctl daemon-reload;systemctl enable elasticsearch.service;systemctl start elasticsearch.service
[root@7-2 ]# vi /etc/elasticsearch/elasticsearch.yml
[root@7-2 installpag]# grep "^[a-Z]" /etc/elasticsearch/elasticsearch.yml 
cluster.name: Linux
node.name: node-1
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 10.4.7.2,127.0.0.1
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.4.7.2", "10.4.7.3", "10.4.7.4"]
discovery.zen.minimum_master_nodes: 2


[root@7-2 ]# mkdir -p /data/elasticsearch;chown -R elasticsearch:elasticsearch /data/elasticsearch/
[root@7-2 ~]# vi /etc/elasticsearch/jvm.options    
-Xms512m
-Xmx512m

[root@7-2 ]# systemctl restart elasticsearch;systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity

[root@7-2 ]# systemctl daemon-reload;systemctl restart elasticsearch;systemctl status elasticsearch

[root@7-2 ]# netstat -tulpn |grep 9200
tcp6       0      0 10.4.7.2:9200           :::*                    LISTEN      3404/java           
tcp6       0      0 127.0.0.1:9200          :::*                    LISTEN      3404/java

注意：在做单节点的时候，我们查看的日志是：/var/log/elasticsearch/elasticsearch.log。这里由于我们做了集群，集群名字叫Linux，所以日志就是/var/log/elasticsearch/Linux.log

[root@7-2 elasticsearch]# tail -f /var/log/elasticsearch/Linux.log 

如果遇到此报错：就说明/date/elasticsearch权限不对
Caused by: java.nio.file.AccessDeniedException: /date/elasticsearch

如果遇到此报错：就说明目前没有检测到"10.4.7.3", "10.4.7.4"。注意：是按照9200端口检测
[2021-02-21T13:50:31,871][WARN ][o.e.d.z.ZenDiscovery     ] [node-1] not enough master nodes discovered during pinging (found [[Candidate{node={node-1}{uZ_U4U9BTfiW6Lyy9KIydg}{8rYhiQaTTQmHbC1dXfO3cQ}{10.4.7.2}{10.4.7.2:9300}{ml.machine_memory=1907724288, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, clusterStateVersion=-1}]], but needed [2]), pinging again

2.2、部署10.4.7.3：

[root@7-3 ]# yum install -y java-1.8.0-openjdk.x86_64
[root@7-3 ]# mkdir -p /opt/installpag;cd /opt/installpag
[root@7-3 installpag ]# rz elasticsearch-6.6.0.rpm 
[root@7-3 installpag ]# rpm -ivh elasticsearch-6.6.0.rpm 
[root@7-3 ]# systemctl daemon-reload;systemctl enable elasticsearch.service;systemctl start elasticsearch.service
[root@7-3 ]# vi /etc/elasticsearch/elasticsearch.yml
[root@7-3 installpag]# grep "^[a-Z]" /etc/elasticsearch/elasticsearch.yml 
cluster.name: Linux
node.name: node-2
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 10.4.7.3,127.0.0.1
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.4.7.2", "10.4.7.3", "10.4.7.4"]
discovery.zen.minimum_master_nodes: 2


[root@7-3 ]# mkdir -p /data/elasticsearch;chown -R elasticsearch:elasticsearch /data/elasticsearch/
[root@7-3  ~]# vi /etc/elasticsearch/jvm.options    
-Xms512m
-Xmx512m

[root@7-3 ]# systemctl restart elasticsearch;systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity

[root@7-3 ]# systemctl daemon-reload;systemctl restart elasticsearch;systemctl status elasticsearch

[root@7-3 ]# netstat -tulpn |grep 9200
tcp6       0      0 10.4.7.2:9200           :::*                    LISTEN      3404/java           
tcp6       0      0 127.0.0.1:9200          :::*                    LISTEN      3404/java

2.3、部署10.4.7.4：

[root@7-4 ]# yum install -y java-1.8.0-openjdk.x86_64
[root@7-4 ]# mkdir -p /opt/installpag;cd /opt/installpag
[root@7-4 installpag ]# rz elasticsearch-6.6.0.rpm 
[root@7-4 installpag ]# rpm -ivh elasticsearch-6.6.0.rpm 
[root@7-4 ]# systemctl daemon-reload;systemctl enable elasticsearch.service;systemctl start elasticsearch.service
[root@7-4 ]# vi /etc/elasticsearch/elasticsearch.yml
[root@7-4 installpag]# grep "^[a-Z]" /etc/elasticsearch/elasticsearch.yml 
cluster.name: Linux
node.name: node-3
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 10.4.7.4,127.0.0.1
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.4.7.2", "10.4.7.3", "10.4.7.4"]
discovery.zen.minimum_master_nodes: 2


[root@7-4 ]# mkdir -p /data/elasticsearch;chown -R elasticsearch:elasticsearch /data/elasticsearch/
[root@7-4  ~]# vi /etc/elasticsearch/jvm.options    
-Xms512m
-Xmx512m

[root@7-4 ]# systemctl restart elasticsearch;systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity

[root@7-4 ]# systemctl daemon-reload;systemctl restart elasticsearch;systemctl status elasticsearch

[root@7-4 ]# netstat -tulpn |grep 9200
tcp6       0      0 10.4.7.2:9200           :::*                    LISTEN      3404/java           
tcp6       0      0 127.0.0.1:9200          :::*                    LISTEN      3404/java

排错：

[1] bootstrap checks failed
[1]: memory locking requested for elasticsearch process but memory is not locked
[2021-02-21T14:20:19,030][INFO ][o.e.n.Node               ] [node-3] stopping ...
[2021-02-21T14:20:19,149][INFO ][o.e.n.Node               ] [node-3] stopped
[2021-02-21T14:20:19,149][INFO ][o.e.n.Node               ] [node-3] closing ...
[2021-02-21T14:20:19,247][INFO ][o.e.n.Node               ] [node-3] closed
[2021-02-21T14:20:19,258][INFO ][o.e.x.m.p.NativeController] [node-3] Native controller process has stopped - no new native processes can be started

解决方案：

解决方法一(关闭bootstrap.memory_lock:，会影响性能）：
# vim /etc/elasticsearch/elasticsearch.yml          // 设置成false就正常运行了。
bootstrap.memory_lock: false

解决方法二（开启bootstrap.memory_lock:）：
1. 修改文件/etc/elasticsearch/elasticsearch.yml，上面那个报错就是开启后产生的，如果开启还要修改其它系统配置文件 
bootstrap.memory_lock: true
2. 修改文件/etc/security/limits.conf，最后添加以下内容。      
*		 soft	 nofile		 65536
*		 hard	 nofile		 65536
*		 soft	 nproc		 32000
*		 hard	 nproc		 32000
*		 hard	 momlock	                 unlimited
*		 soft	 momlock	                 unlimited

3. 修改文件/etc/sysctl.conf 
vm.max_map_count = 655360

4. 修改文件 /etc/systemd/system.conf ，分别修改以下内容。
DefaultLimitNOFILE=65536
DefaultLimitNPROC=32000
DefaultLimitMEMLOCK=infinity

5。对于上面的内容让其生效或者重启下系统
[root@localhost security]# sysctl -p
vm.max_map_count = 655360

查看集群状态：

插入一个索引。显示集群默认分配数据（5分片1副本。也就是5个主分片跟1套副本。一套副本就是5个主分片内容）

查看集群的健康状态：_cluster/health
查看集群的健康状态详细：_cluster/state?human&pretty
查看集群设置：_cluster/settings?include_defaults=true&human&pretty
curl -XGET 'http://localhost:9200/_nodes/procese?human&pretty' 反回集群名称
curl -XGET 'http://localhost:9200/_nodes/_all/info/jvm,process?human&pretty' 所有的信息、IP地址、内存、版本
curl -XGET 'http://localhost:9200/_cat/nodes?human&pretty' 重要，当前所有节点打印出来
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_cat/indices?pretty' 查看有几个索引

{
"cluster_name": "Linux",
"status": "green",   状态
"timed_out": false,
"number_of_nodes": 3,            有多少个节点
"number_of_data_nodes": 3,    有多少个数据节点
"active_primary_shards": 5,      一共有多少个主分片
"active_shards": 10,                  活动的分片一共有多少个（主分片+副本分片）
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

节点角色：

主节点：负责调度数据反回数据

工作节点：负责处理数据

默认情况下：

1、所有节点都是工作节点

2、主节点及负责调度有负责处理数据

3、主节点跟谁先启动有关

3、脑裂后合并

准备二台机器：10.4.7.2、10.4.7.3。关闭10.4.7.4的ES，让10.4.7.2、10.4.7.3成为集群

思路：

1、注释10.4.7.2、10.4.7.3的discovery.zen.minimum_master_nodes: 2，修改discovery.zen.ping.unicast.hosts: ["10.4.7.2", "10.4.7.3"]。关闭10.4.7.4的ES。使得让10.4.7.2、10.4.7.3成为集群，并且集群在一个节点的时候也能启动

2、关闭10.4.7.2、10.4.7.3的ES，清空10.4.7.3数据：rm -rf /data/elasticsearch/*，启动10.4.7.3，在10.4.7.3插入数据。同样的手法，关闭10.4.7.2、10.4.7.3，清空10.4.7.2，启动10.4.7.2，10.4.7.2插入数据。最后重启10.4.7.2、10.4.7.3然后由于配置集群，然后就会成为集群。集群中还是green无问题

3、这个时候问题就来了，这两个数据应该怎么办，是合并还是吞并。
答：如果10.4.7.2中有的数据，10.4.7.3没有，就会合并这些数据。如果数据有冲突，比如10.4.7.2中有A库B表id=1的一行数据，10.4.7.3中有A库B表id=1的一行数据。全部都一样，这就导致你刷新一次显示10.4.7.2的数据，在刷新一次显示10.4.7.3数据

总结：如果出现这种情况，解决办法：
两个master，需要清空一个节点的数据，以一台为准。
操作：把一个节点的数据清空：rm -rf /data/elasticsearch/*。然后重启启动此节点，这样由于此节点数据是空，集群后，就会同步另一个master数据

4、ES分片与副本讲解

1、默认分配数据（5分片1副本。也就是5个主分片跟1套副本。一套副本就是5个主分片内容）

2、ES分散数据：比如3节点，如果其中一台节点损坏后，数据不会丢（0~4依旧在其他节点存在）

3、比如3节点，如果其中一台节点损坏后，剩下两个节点会立即同步各自的数据。这样的好处是，如果其中一台节点有坏了，数据不会丢（0~4依旧在其他节点存在）

注意：如果是3节点，两个节点同时丢失（这里指两个节点的/data/elasticsearch/*数据全部丢失，就会导致部分的数据丢失。如果是停止，启动后会立同步）

4、集群状态颜色：
绿色：所有条件都满足，数据完整，副本满足
黄色：数据完整，副本不满足
红色：有索引数据出现不完整了（集群中有3个节点，2个节点同时宕机导致的数据丢失后，红色）。注意：一旦是红色，那些缺少分片的节点不能插入数据，但是不缺少分片的能插入数据
紫色：有分片正在同步中（集群中有3个节点，2个节点依次宕机数据未丢失后，启动节点，这时候就会选举主分片在哪个节点。选举的时候主分片是紫色）

5、思考如何做集群监控：不能只监控green。这是因为你有3个节点，停止一个节点，剩下2个节点依旧能green。所以要监控有多少节点。报警怎么写（监控green、监控有多少节点）：是or( 只要有一个满足就报警)

[root@7-3 ~]# curl -XGET 'http://localhost:9200/_cat/nodes?human&pretty'
10.4.7.2 21 95  5 0.05 0.06 0.13 mdi - node-1
10.4.7.3 27 93 92 6.33 4.59 4.26 mdi - node-2
10.4.7.4 24 93  0 0.00 0.01 0.05 mdi * node-3
[root@7-3 ~]# curl -XGET 'http://localhost:9200/_cat/nodes?human&pretty' |wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
100   138  100   138    0     0     29      0  0:00:04  0:00:04 --:--:--    29
3
[root@7-3 ~]# 
[root@7-3 ~]# curl -s -XGET 'http://localhost:9200/_cat/nodes?human&pretty' |wc -l
3

解释：
执行curl命令时，会返回这样的% Total % Received % 统计信息
curl -s 不输出统计信息

6、默认是5分片1副本，如何修改

默认分片数据：

[root@7-3 ~]# curl -XPUT 'localhost:9200/index1?pretty'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "index1"
}

创建索引的时候指定分片和副本：

[root@7-3 ~]# curl -XPUT 'localhost:9200/index2?pretty' -H 'Content-Type: application/json' -d'
{
   "settings" : {
   "number_of_shards" : 3,     指定分片
   "number_of_replicas" : 1      指定副本
  }
}'

已经创建的索引修改：
首先声明，已经创建的索引，分片是不可以改，但是可以改副本。理由：分片是主节点，副本是复制主节点

[root@7-3 ~]# curl -XPUT 'localhost:9200/index2/_settings?pretty' -H 'Content-Type: application/json' -d'
{
   "settings" : {
   "number_of_replicas" : 2      指定副本
  }
}'

思考：目前集群中3个节点，索引分别是5节点1副本、3节点2副本。如果停止一个节点，是green? 答：是yellow，因为3节点2副本，停止一个节点后，构建不成2副本，导致黄色：数据完整，副本不满足

Jerry00713

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch 讲解

介绍：1、Elasticsearch的功能(1）分布式搜索和分析引擎(2）全文检索，结构化检索，数据分析(3）对海量数据进行近实时的处理(4）日志收集展示分析（对于运维主要使用）分布式：ES自动可以将海量数据分散到多台服务器上存储和检索海量数据的处理：分布式以后，就可以采用大量的服务器去存储和检索数据，自然而然就可以实现海量数据的处理了。跟分布式/海量数据相反的，lucene,单机应用，只能在单台服务器上使用，最多只能处理单台服务器可以处理的数据量。近实时：在秒级别对数据进行搜索和分析
复制链接

扫一扫

专栏目录