Elasticsearch

S Y H

已于 2022-10-13 21:47:53 修改

阅读量397

点赞数

分类专栏：数据库文章标签： elasticsearch 大数据 big data

于 2021-12-03 19:26:51 首次发布

本文链接：https://blog.csdn.net/asddasddeedd/article/details/121704762

版权

数据库专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一、什么是elasticsearch

elasticsearch简写es，es是一个高扩展、开源的全文检索和分析引擎，它可以准实时地快速存储、搜索、分析海量的数据。比如：一个线上商城系统，用户需要搜索商城上的商品。在这里你可以用es存储所有的商品信息和库存信息，用户只需要输入”空调”就可以搜索到他需要搜索到的商品

二、elasticsearch-7.1.1linux单节点安装部署

（1）首先需要jdk环境

（2）下载地址

Download Elasticsearch | Elastic

（3）压缩包上传

# root是没办法启用Elasticsearch的,所以我们需要创建一个用户youtwo
useradd youtwo   # linux创建新用户youtwo
passwd youtwo  #设置用户名的密码（2543828838a）

#创建的用户名会在/home下创建一个目录youtwo

#把我们的安装包上传到youtwo目录下，但是文件是属于root的我们需要更改一下组 
chown youtwo:youtwo elasticsearch-7.16.1-linux-x86_64.tar.gz

（4）文件解压

#修改文件的权限
chmod 755 elasticsearch-7.16.1-linux-x86_64.tar.gz  

#解压文件，解压前记得切换用户 
su - youtwo
tar -zxvf elasticsearch-7.16.1-linux-x86_64.tar.gz

# 现在改下解压后的权限,切换会root 更改权限: 使用到-R,把目录下的所有权限都跟着一起修改
# su - root
chown -R youtwo:youtwo elasticsearch

# 切换用户
su - youtwo

#修改文件的名称
mv elasticsearch-7.16.1 elasticsearch

（5）配置文件修改

# 修改配置文件
vim /home/youtwo/elasticsearch/config/elasticsearch.yml

# 记得创建data目录,因为logs已经有了不需要创建
mkdir /home/youtwo/elasticsearch/config/data

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /home/youtwo/elasticsearch/data
#
# Path to log files:
#
path.logs: /home/youtwo/elasticsearch/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0

http.cors.enabled: true
http.cors.allow‐origin: "*"

#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

（7）直接运行会报错，需要修改配置文件

ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]

[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

# 编辑 /etc/security/limits.conf，追加以下内容；
* soft nofile 65536
* hard nofile 65536

# 编辑 /etc/sysctl.conf，追加以下内容：
vm.max_map_count=655360

# 保存后，执行：
sysctl -p

# 重启虚拟机
reboot

（8）运行

# 在bin目录下
cd /home/youtwo/elasticsearch/bin
./elasticsearch

(9) 检测
http://192.168.147.133:9200/

http://192.168.147.133:9200/_cat/nodes

三、插件elasticsearch-head

下载地址 mirrors / mobz / elasticsearch-head · GitCode

zip格式的直接本地解压使用

elasticsearch-head 项目提供了一个直观的界面，可以很方便地查看集群、分片、数据等等。elasticsearch-head最简单的安装方式是作为 chrome 浏览器插件进行安装。

插件安装：

1、将该插件安装在浏览器

2、在浏览器中点击 elasticsearch-head 插件打开 head 界面，并连接 http://192.168.64.181:9200/

连接上显示如下效果：

Elasticsearch中IK中文分词器的使用

1、安装 ik 分词器

从 ik 分词器项目仓库中下载 ik 分词器安装包，下载的版本需要与 Elasticsearch 版本匹配：
https://github.com/medcl/elasticsearch-analysis-ik

或者可以访问 gitee 镜像仓库：
https://gitee.com/mirrors/elasticsearch-analysis-ik

下载 elasticsearch-analysis-ik-7.9.3.zip 复制到 /root/ 目录下

2、在上面三个节点上安装 ik 分词器

cd ~/

# 复制 ik 分词器到三个 es 容器
docker cp elasticsearch-analysis-ik-7.9.3.zip node1:/root/
docker cp elasticsearch-analysis-ik-7.9.3.zip node2:/root/
docker cp elasticsearch-analysis-ik-7.9.3.zip node3:/root/

# 在 node1 中安装 ik 分词器
docker exec -it node1 elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-7.9.3.zip

# 在 node2 中安装 ik 分词器
docker exec -it node2 elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-7.9.3.zip

# 在 node3 中安装 ik 分词器
docker exec -it node3 elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-7.9.3.zip

# 重启三个 es 容器
docker restart node1 node2 node3

3、查看安装结果

在浏览器中访问 http://192.168.64.181:9200/_cat/plugins

4、ik分词测试

ik分词器提供两种分词器： ik_max_word 和 ik_smart

ik_max_word:

会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合，适合 Term Query；

ik_smart:

会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”，适合 Phrase 查询。

`ik_max_word` 分词测试

使用 head 执行下面测试：
向 http://192.168.64.181:9200/_analyze 路径提交 POST 请求，并在协议体中提交 Json 数据：

{
  "analyzer":"ik_max_word",
  "text":"中华人民共和国国歌"
}

`ik_smart` 分词测试

使用 head 执行下面测试：
向 http://192.168.64.181:9200/_analyze 路径提交 POST 请求，并在协议体中提交 Json 数据：

{
  "analyzer":"ik_smart",
  "text":"中华人民共和国国歌"
}

使用 Kibana 操作 ES

1、什么是Kibana

Kibana是操作ElasticSearch的图形化工具
基于Node.js，可以通过web进行操作
使用Kibana进行API操作时，有友好提示
Kibana工具可以生成各种图表

2、下载 Kibana 镜像

docker pull kibana:7.9.3

3、启动 Kibana 容器

docker run \
-d \
--name kibana \
--net es-net \
-p 5601:5601 \
-e ELASTICSEARCH_HOSTS='["http://node1:9200","http://node2:9200","http://node3:9200"]' \
--restart=always \
kibana:7.9.3

4、浏览器访问 Kibana，并进入进入 `Dev Tools`

http://192.168.64.181:5601/

索引、分片和副本

1、索引

Elasticsearch索引用来存储我们要搜索的数据，以倒排索引结构进行存储。

例如，要搜索商品数据，可以创建一个商品数据的索引，其中存储着所有商品的数据，供我们进行搜索：

当索引中存储了大量数据时，大量的磁盘io操作会降低整体搜索新能，这时需要对数据进行分片存储。

2、索引分片

在一个索引中存储大量数据会造成性能下降，这时可以对数据进行分片存储。

每个节点上都创建一个索引分片，把数据分散存放到多个节点的索引分片上，减少每个分片的数据量来提高io性能：

3、索引副本

每个分片都是一个独立的索引，数据分散存放在多个分片中，也就是说，每个分片中存储的都是不同的数据。搜索时会同时搜索多个分片，并将搜索结果进行汇总。

如果一个节点宕机分片不可用，则会造成部分数据无法搜索：

为了解决这一问题，可以对分片创建多个副本来解决

对分片创建多个副本，那么即使一个节点宕机，其他节点中的副本分片还可以继续工作，不会造成数据不可用：

4、使用Kibana创建索引

创建一个名为 products 的索引，用来存储商品数据。

# 创建索引，命名为 products
PUT /products
{
  "settings": {
    "number_of_shards": 3, 
    "number_of_replicas": 2
  }
}

#number_of_shards：分片数量，默认值是 5
#number_of_replicas：副本数量，默认值是 1
#我们有三个节点，在每个节点上都创建一个分片。每个分片在另两个节点上各创建一个副本。
分片和副本参数说明：

用索引名称过滤，查看 products 索引：

注意事项：粗框为主分片，细框为副本分片

5、映射（数据结构）

类似于数据库表结构，索引数据也被分为多个数据字段，并且需要设置数据类型和其他属性。

映射，是对索引中字段结构的定义和描述。

Ⅰ、字段的数据类型

类型参考：

Field data types | Elasticsearch Guide [7.15] | Elastic

数字类型：

byte、short、integer、long
float、double
unsigned_long

字符串类型：

text ：会进行分词
keyword ：不会进行分词，适用于email、主机地址、邮编等

日期和时间类型：

date

Ⅱ、创建映射

映射参考：

Mapping | Elasticsearch Guide [7.15] | Elastic

在 products 索引中创建映射：

分词器设置：

analyzer：在索引中添加文档时，text类型通过指定的分词器分词后，再插入倒排索引
search_analyzer：使用关键词检索时，使用指定的分词器对关键词进行分词
查询时，关键词优先使用 search_analyzer 设置的分词器，如果 search_analyzer 不存在则使用 analyzer 分词器。

# 定义mapping，数据结构
PUT /products/_mapping
{
"properties": {
"id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"category": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
},
"price": {
"type": "float"
},
"city": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
},
"barcode": {
"type": "keyword"
}
}
}

Ⅲ、查看映射

GET /products/_mapping

添加文档（添加数据）

添加的文档会有一个名为_id的文档id，这个文档id可以自动生成，也可以手动指定，通常可以使用数据的id作为文档id。

# 添加文档
PUT /products/_doc/10033
{
"id":"10033",
"title":"SONOS PLAY:5(gen2) 新一代PLAY:5无线智能音响系统 WiFi音箱家庭,潮酷数码会场",
"category":"潮酷数码会场",
"price":"3980.01",
"city":"上海",
"barcode":"527848718459"
}

PUT /products/_doc/10034
{
"id":"10034",
"title":"天猫魔盒 M13网络电视机顶盒高清电视盒子wifi 64位硬盘播放器",
"category":"潮酷数码会场",
"price":"398.00",
"city":"浙江杭州",
"barcode":"522994634119"
}

PUT /products/_doc/10035
{
"id":"10035",
"title":"BOSE SoundSport耳塞式运动耳机重低音入耳式防脱降噪音乐耳机",
"category":"潮酷数码会场",
"price":"860.00",
"city":"浙江杭州",
"barcode":"526558749068"
}

PUT /products/_doc/10036
{
"id":"10036",
"title":"【送支架】Beats studio Wireless 2.0无线蓝牙录音师头戴式耳机",
"category":"潮酷数码会场",
"price":"2889.00",
"city":"上海",
"barcode":"37147009748"
}

PUT /products/_doc/10037
{
"id":"10037",
"title":"SONOS PLAY:1无线智能音响系统美国原创WiFi连接家庭桌面音箱",
"category":"潮酷数码会场",
"price":"1580.01",
"city":"上海",
"barcode":"527783392239"
}

也可以自动生成 _id 值（不是纯数字）：

POST /products/_doc
{
"id":"10027",
"title":"vivo X9前置双摄全网通4G美颜自拍超薄智能手机大屏vivox9",
"category":"手机会场",
"price":"2798.00",
"city":"广东东莞",
"barcode":"541396973568"
}

1、查看文档：

GET /products/_doc/10037

查看指定文档title字段的分词结果：

GET /products/_doc/10037/_termvectors?fields=title

2、修改文档

底层索引数据无法修改，修改数据实际上是先删除再重新添加。

两种修改方式：

PUT：对文档进行完整的替换
POST：可以修改一部分字段

修改价格字段的值：

# 修改文档 - 替换
PUT /products/_doc/10037
{
  "id":"10037",
  "title":"SONOS PLAY:1无线智能音响系统 美国原创WiFi连接 家庭桌面音箱",
  "category":"潮酷数码会场",
  "price":"9999.99",
  "city":"上海",
  "barcode":"527783392239"
}

查看文档：

GET /products/_doc/10037

修改价格和城市字段的值：

# 修改文档 - 更新部分字段
POST /products/_update/10037
{
  "doc": {
    "price":"8888.88",
    "city":"深圳"
  }
}

查看文档：

GET /products/_doc/10037

3、删除文档

删除指定id的文档

DELETE /products/_doc/10037

清空所有文档：

POST /products/_delete_by_query
{
  "query": {
    "match_all": {}
  }
}

4、删除索引

# 删除 products 索引
DELETE /products

搜索

1、创建索引和映射

PUT /pditems
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"brand": {
"type": "text",
"analyzer": "ik_smart"
},
"title": {
"type": "text",
"analyzer": "ik_max_word"
},
"sell_point": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"price": {
"type": "float"
},
"image": {
"type": "keyword"
},
"cid": {
"type": "long"
},
"status": {
"type": "byte"
},
"created": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"updated": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}

用 head 查看索引：

2、导入数据

导入的数据格式：

{ "index": {"_index": "pditems", "_id": "536563"}}
{ "id":"536563","brand":"联想","title":"联想(Lenovo)小新Air13 Pro 13.3英寸14.8mm超轻薄笔记本电脑","sell_point":"清仓！仅北京，武汉仓有货！","price":"6688.0","barcode":"","image":"/images/server/images/portal/air13/little4.jpg","cid":"163","status":"1","created":"2015-03-08 21:33:18","updated":"2015-04-11 20:38:38"}

文件格式为‘.json结尾’

将压缩文件中的 pditems.json 上传到服务器

在服务器上，进入 pditems.json 所在的文件夹，执行批量数据导入：

curl -XPOST 'localhost:9200/pditems/_bulk' \
    -H 'Content-Type:application/json' \
    --data-binary @pditems.json

3、查看数据

搜索 pditems 索引中全部 3160 条数据：

GET /pditems/_search
{
  "query": {
    "match_all": {}
  },
  "size": 3160
}

4、搜索文档

Ⅰ、搜索所有文档

# 搜索 pditems 索引中全部数据
POST /pditems/_search
{
  "query": {
    "match_all": {}
  }
}

Ⅱ、关键词搜索

# 查询 pditems 索引中title中包含"电脑"的商品
POST /pditems/_search
{
  "query": {
    "match": {
      "title": "电脑"
    }
  }
}

Ⅲ、搜索结果过滤器

POST /pditems/_search
{
  "query": {
    "bool": {  #可以添加多个条件
      "must": [  #必须满足的条件
        {
          "match": {
            "title": "电脑"
          }
        }
      ],

      "filter": [  #过滤器
        {
          "range": { #范围过滤
            "price": { #字段
              "gte": "2000"  #大于2000
            }
          }
        }
      ]
    }
  }
}

Ⅳ、搜索结果高亮显示

#搜索结果高亮显示
POST /pditems/_search
{
	"query": {   #多个字段搜索
			"query": "手机",  #收索条件
			"fields": ["title", "sell_point"]  #在哪些字段搜索
		}
	},
	"highlight" : { #高亮的设置属性
        "pre_tags" : ["<i class=\"highlight\">"], #通用标签设置
        "post_tags" : ["</i>"], 
        "fields" : {
            "title" : {},
            "sell_point" : {
              "pre_tags": "<em>",  #单独对该字段设置标签
              "post_tags": "</em>"
            }
        }
    }
}

POST /pditems/_search
{
	"query": {
		"multi_match":{
			"query": "手机",
			"fields": ["title", "sell_point"]
		}
	},
	"highlight" : {
        "pre_tags" : ["<i class=\"highlight\">"],
        "post_tags" : ["</i>"],
        "fields" : {
            "title" : {},
            "sell_point" : {
              "pre_tags": "<em>",
              "post_tags": "</em>"
            }
        }
    }
}

spring整合es

使用步骤

1、添加依赖

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>

2、配置yml

spring:
  elasticsearch:
    rest:
      uris:
        - http://192.168.64.181:9200
        - http://192.168.64.181:9201
        - http://192.168.64.181:9202

logging:
  level:
    tracer: trace   #向es提交的日志

3、新建与es索引对应的实体类

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;


@Data
@NoArgsConstructor
@AllArgsConstructor
//indexName对应es中哪个索引，shards表示创建分片，replicas表示创建备份（实现高可用），但不应该依赖api自动创建
@Document(indexName = "students" ,shards = 3,replicas = 2)
public class Student {
    @Id
    private Long id;
    private String name;
    private Character gender;
    @Field("birthDate")  //对应es中的哪个字段名，字段名与属性名相同可以省略
    private String birthDate;
}

4、新建 Repository 接口： StudentRepository (与 Mapper 作用相同，都是访问底层数据库的数据)

import cn.tedu.es.entity.Student;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

import java.util.List;

/*
Spring Data API 定义的 Repository 规范，
只需要定义接口，就可以实现数据增删改查
 */                                                     //映射的数据类型，Student中id的类型
public interface StudentRepository extends ElasticsearchRepository<Student,Long> {

    //在姓名字段中搜索关键词
    List<Student> findByName(String key);

    //在姓名字段中搜索，在出生日期中搜索,并且添加分页功能(添加参数Pageable)
    List<Student> findByNameOrBirthDate(String key, String birthDate, Pageable pageable);

}

5、开始操作数据

import cn.tedu.es.entity.Student;
import cn.tedu.es.repo.StudentRepository;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;

import java.util.List;
import java.util.Optional;

@SpringBootTest
class EsApplicationTests {

    @Autowired
    private StudentRepository repository;

    @Test
        //测试添加数据
    void contextLoads() {
        repository.save(new Student(9527L, "唐伯虎", '男', "2021-01-18"));  //添加或修改数据
        repository.save(new Student(9528L, "唐伯虎1", '男', "2021-01-18"));  //添加或修改数据
        repository.save(new Student(9529L, "唐伯虎2", '男', "2021-01-19"));  //添加或修改数据
        repository.save(new Student(9530L, "唐伯虎3", '男', "2021-01-19"));  //添加或修改数据
    }

    @Test
        //测试修改数据
    void contextLoads1() {
        repository.save(new Student(9527L, "唐伯虎111", '男', "2021-01-18"));  //添加或修改数据
        repository.save(new Student(9528L, "唐伯虎121", '男', "2021-01-18"));  //添加或修改数据
        repository.save(new Student(9529L, "唐伯虎231", '男', "2021-01-08"));  //添加或修改数据
        repository.save(new Student(9530L, "唐伯虎341", '男', "2021-01-18"));  //添加或修改数据
    }

    @Test
        //查询数据
    void contextLoads2() {

        /*Optional是jdk提供的一个数据包装对象，可以防止id不存在时出现空指针异常*/
        Optional<Student> op = repository.findById(9527L); //根据id查询一条数据

        if (op.isPresent()) {  //判断内部的学生对象是否存
            System.out.println(op.get());
        }

        System.out.println("========================================");

        Iterable<Student> all = repository.findAll();
        for (Student s : all) {
            System.out.println(s);
        }
    }

    @Test
        //删除数据
    void contextLoads3() {

        //根据id删除数据
       repository.deleteById(9530L);

    }


    @Test
        //根据一个字段中关键字搜索数据
    void contextLoads4() {

        List<Student> list = repository.findByName("伯虎");

        System.out.println(list);
    }

    @Test
        //根据两个字段中关键字搜索数据并且对数据进行分页
    void contextLoads6() {                                            //对于不分词的字段，需要写全。

        /*Pageable对象，封装分页参数
        *               页号，0开始
        *               每页大小
        * */
        Pageable pageable = PageRequest.of(0,2); //查询第一页的数据
        List<Student> list = repository.findByNameOrBirthDate("伯虎","2021-01-19",pageable);


        System.out.println(list);
    }

}