Elasticsearch 相关知识

H```

已于 2022-02-28 15:54:28 修改

阅读量1.7k

点赞数

分类专栏： docker 搭建环境文章标签： elasticsearch big data 搜索引擎

于 2022-02-09 22:13:44 首次发布

本文链接：https://blog.csdn.net/weixin_45576271/article/details/122850238

版权

docker 搭建环境专栏收录该内容

3 篇文章 0 订阅

订阅专栏

文章目录

docker 搭建ES集群
基础用法

docker 搭建ES集群

关闭防火墙

# 关闭防火墙
systemctl stop firewalld.service

# 禁用防火墙
systemctl disable firewalld.service

设置底层系统参数

echo 'vm.max_map_count=262144' >>/etc/sysctl.conf

准备网络和挂载目录

# 创建虚拟网络
docker network create es-net

# node1 的挂载目录
mkdir -p -m 777 /var/lib/es/node1/plugins
mkdir -p -m 777 /var/lib/es/node1/data

# node2 的挂载目录
mkdir -p -m 777 /var/lib/es/node2/plugins
mkdir -p -m 777 /var/lib/es/node2/data

# node3 的挂载目录
mkdir -p -m 777 /var/lib/es/node3/plugins
mkdir -p -m 777 /var/lib/es/node3/data

启动3个集群

参数解释

docker run -d \
  --name=node1 \
  --restart=always \
  # 连接虚拟网络
  --net es-net \
  # 端口映射 9200 和客户端通信
  -p 9200:9200 \
  # 9300 集群服务器之间通信
  -p 9300:9300 \
  # 插件安装目录
  -v /var/lib/es/node1/plugins:/usr/share/elasticsearch/plugins \
  # 数据存储目录
  -v /var/lib/es/node1/data:/usr/share/elasticsearch/data \
  # 唯一的节点名
  -e node.name=node1 \
  # 是否可被选为主节点
  -e node.master=true \
  # 当前节点的地址
  -e network.host=node1 \
  # 集群中其他节点的地址列表
  -e discovery.seed_hosts=node1,node2,node3 \
  # 候选的主节点地址列表
  -e cluster.initial_master_nodes=node1 \
  # 集群名 同一集群名字需一致
  -e cluster.name=es-cluster \
  # 内存大小
  -e "ES_JAVA_OPTS=-Xms256m -Xmx256m" \
  # 镜像
  elasticsearch:7.9.3

node1：

docker run -d \
  --name=node1 \
  --restart=always \
  --net es-net \
  -p 9200:9200 \
  -p 9300:9300 \
  -v /var/lib/es/node1/plugins:/usr/share/elasticsearch/plugins \
  -v /var/lib/es/node1/data:/usr/share/elasticsearch/data \
  -e node.name=node1 \
  -e node.master=true \
  -e network.host=node1 \
  -e discovery.seed_hosts=node1,node2,node3 \
  -e cluster.initial_master_nodes=node1 \
  -e cluster.name=es-cluster \
  -e "ES_JAVA_OPTS=-Xms256m -Xmx256m" \
  elasticsearch:7.9.3

node2：

docker run -d \
  --name=node2 \
  --restart=always \
  --net es-net \
  -p 9201:9200 \
  -p 9301:9300 \
  -v /var/lib/es/node2/plugins:/usr/share/elasticsearch/plugins \
  -v /var/lib/es/node2/data:/usr/share/elasticsearch/data \
  -e node.name=node2 \
  -e node.master=true \
  -e network.host=node2 \
  -e discovery.seed_hosts=node1,node2,node3 \
  -e cluster.initial_master_nodes=node1 \
  -e cluster.name=es-cluster \
  -e "ES_JAVA_OPTS=-Xms256m -Xmx256m" \
  elasticsearch:7.9.3

node3：

docker run -d \
  --name=node3 \
  --restart=always \
  --net es-net \
  -p 9202:9200 \
  -p 9302:9300 \
  -v /var/lib/es/node3/plugins:/usr/share/elasticsearch/plugins \
  -v /var/lib/es/node3/data:/usr/share/elasticsearch/data \
  -e node.name=node3 \
  -e node.master=true \
  -e network.host=node3 \
  -e discovery.seed_hosts=node1,node2,node3 \
  -e cluster.initial_master_nodes=node1 \
  -e cluster.name=es-cluster \
  -e "ES_JAVA_OPTS=-Xms256m -Xmx256m" \
  elasticsearch:7.9.3

查看结果

http://192.168.64.181:9200/

查看节点

http://192.168.64.181:9200/_cat/nodes

浏览器插件

在 elasticsearch-head 项目仓库中下载 chrome 浏览器插件
https://github.com/mobz/elasticsearch-head/raw/master/crx/es-head.crx
将文件后缀改为 zip
解压缩
在 chrome 浏览器中选择“更多工具”–“扩展程序”
在“扩展程序”中确认开启了“开发者模式”
点击“加载已解压的扩展程序”
选择前面解压的插件目录
在浏览器中点击 elasticsearch-head 插件打开 head 界面，并连接 http://192.168.64.181:9200/

基础用法

是一种数据库，搜索引擎，会将文档进行分词，所以查询效率较高

倒排索引

将分词后的关键词与包含关键词的文档进行对应关联

分词方式

ngram_analyzer

更适用于人名

CJK

两个字两个字进行分词（不太好）

IK

IK 有字典库，他会将文档中的字与字典库进行比对

安装IK分词器

需要和 IK 和 Elasticsearch版本一致

复制 ik 分词器到三个 es 容器


docker cp elasticsearch-analysis-ik-7.9.3.zip node1:/root/
docker cp elasticsearch-analysis-ik-7.9.3.zip node2:/root/
docker cp elasticsearch-analysis-ik-7.9.3.zip node3:/root/

分别在容器中安装插件

# 在 node1 中安装 ik 分词器
docker exec -it node1 elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-7.9.3.zip

# 在 node2 中安装 ik 分词器

docker exec -it node2 elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-7.9.3.zip

# 在 node3 中安装 ik 分词器
docker exec -it node3 elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-7.9.3.zip

重启

# 重启三个 es 容器
docker restart node1 node2 node3

分词方式

ik_smart

粗粒度的拆分
向 http://192.168.64.181:9200/_analyze 路径提交 POST 请求
在这里插入图片描述

ik_max_word

最细粒度的拆分，会穷尽各种可能的组合
向 http://192.168.64.181:9200/_analyze 路径提交 POST 请求
在这里插入图片描述

kibana

下载镜像

docker pull kibana:7.9.3

启动容器

docker run \
-d \
--name kibana \
--net es-net \
-p 5601:5601 \
-e ELASTICSEARCH_HOSTS='["http://node1:9200","http://node2:9200","http://node3:9200"]' \
--restart=always \
kibana:7.9.3

输入网址访问http://192.168.64.181:5601/

索引分片副本

可以将索引分片存在多台服务器上，提高搜索效率（分布式搜索）
防止服务器关闭创建副本

主服务器负责分发，结果汇总

创建索引

number_of_shards:分片的数量，默认5
number_of_replicas: 副本的数量，默认1

# 创建索引，命名为 products
PUT /products
{
  "settings": {
    "number_of_shards": 3, 
    "number_of_replicas": 2
  }
}

在这里插入图片描述

粗框表示主分片，细框表示副本分片
在这里插入图片描述

映射（数据结构）

类似于数据库表结构，索引数据也被分为多个数据字段，并且需要设置数据类型和其他属性。

映射，是对索引中字段结构的定义和描述。

常用数据类型

数字类型
- byte、short、integer、long
- float、double
- unsigned_long
字符串类型
- text ：会进行分词
- keyword ：不会进行分词，适用于email、主机地址、邮编等
日期时间类型
- date

创建映射

analyzer：在索引中添加文档时，text类型通过指定的分词器分词后，再插入倒排索引
search_analyzer：使用关键词检索时，使用指定的分词器对关键词进行分词

# 定义mapping，数据结构
PUT /products/_mapping
{
  "properties": {
    "id": {
      "type": "long"
    },
    "title": {
      "type": "text",
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_smart"
    },
    "category": {
      "type": "text",
      "analyzer": "ik_smart",
      "search_analyzer": "ik_smart"
    },
    "price": {
      "type": "float"
    },
    "city": {
      "type": "text",
      "analyzer": "ik_smart",
      "search_analyzer": "ik_smart"
    },
    "barcode": {
      "type": "keyword"
    }
  }
}

查看映射

GET /products/_mapping

添加数据

添加的文档会有一个名为_id的文档id，这个文档id可以自动生成，也可以手动指定，通常可以使用数据的id作为文档id

# 添加文档
PUT /products/_doc/10033
{
  "id":"10033",
  "title":"SONOS PLAY:5(gen2) 新一代PLAY:5无线智能音响系统 WiFi音箱家庭,潮酷数码会场",
  "category":"潮酷数码会场",
  "price":"3980.01",
  "city":"上海",
  "barcode":"527848718459"
}


PUT /products/_doc/10034
{
  "id":"10034",
  "title":"天猫魔盒 M13网络电视机顶盒 高清电视盒子wifi 64位硬盘播放器",
  "category":"潮酷数码会场",
  "price":"398.00",
  "city":"浙江杭州",
  "barcode":"522994634119"
}



PUT /products/_doc/10035
{
  "id":"10035",
  "title":"BOSE SoundSport耳塞式运动耳机 重低音入耳式防脱降噪音乐耳机",
  "category":"潮酷数码会场",
  "price":"860.00",
  "city":"浙江杭州",
  "barcode":"526558749068"
}



PUT /products/_doc/10036
{
  "id":"10036",
  "title":"【送支架】Beats studio Wireless 2.0无线蓝牙录音师头戴式耳机",
  "category":"潮酷数码会场",
  "price":"2889.00",
  "city":"上海",
  "barcode":"37147009748"
}


PUT /products/_doc/10037
{
  "id":"10037",
  "title":"SONOS PLAY:1无线智能音响系统 美国原创WiFi连接 家庭桌面音箱",
  "category":"潮酷数码会场",
  "price":"1580.01",
  "city":"上海",
  "barcode":"527783392239"
}

查看文档

GET /products/_doc/10037

查看指定文档title字段的分词结果

GET /products/_doc/10035/_termvectors?fields=city

在这里插入图片描述

修改文档

底层索引数据无法修改，修改数据实际上是先删除再重新添加。

两种修改方式：
底层都是先删除后添加

PUT：对文档进行完整的替换
需完整的文档数据

# 修改文档 - 替换
PUT /products/_doc/10037
{
  "id":"10037",
  "title":"SONOS PLAY:1无线智能音响系统 美国原创WiFi连接 家庭桌面音箱",
  "category":"潮酷数码会场",
  "price":"9999.99",
  "city":"上海",
  "barcode":"527783392239"
}

POST：可以修改一部分字段

# 修改文档 - 更新部分字段
POST /products/_update/10037
{
  "doc": {
    "price":"8888.88",
    "city":"深圳"
  }
}

删除

DELETE /products/_doc/10037

清空

POST /products/_delete_by_query
{
  "query": {
    "match_all": {}
  }
}

删除索引

# 删除 products 索引
DELETE /products

搜索

添加数据

创建索引和映射

PUT /pditems
{
  "settings": {
    "number_of_shards": 3, 
    "number_of_replicas": 2
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "long"
      },
      "brand": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "title": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "sell_point": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "price": {
        "type": "float"
      },
      "image": {
        "type": "keyword"
      },
      "cid": {
        "type": "long"
      },
      "status": {
        "type": "byte"
      },
      "created": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "updated": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    } 
  }
}

导入数据

curl -XPOST 'localhost:9200/pditems/_bulk' \
    -H 'Content-Type:application/json' \
    --data-binary @pditems.json

搜索数据

搜索所有数据

# 搜索 pditems 索引中全部数据
POST /pditems/_search
{
  "query": {
    "match_all": {}
  }
}

搜索关键字

# 查询 pditems 索引中title中包含"电脑"的商品
POST /pditems/_search
{
  "query": {
    "match": {
      "title": "电脑"
    }
  }
}

过滤数据

# 价格大于2000，并且title中包含"电脑"的商品
POST /pditems/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "电脑"
          }
        }
      ],

      "filter": [
        {
          "range": {
            "price": {
              "gte": "2000"
            }
          }
        }
      ]
    }
  }
}

搜索结果高亮显示

multi_match:多个字段

POST /pditems/_search
{
	"query": {
		"multi_match":{
			"query": "手机",
			"fields": ["title", "sell_point"]
		}
	},
	"highlight" : {
        "pre_tags" : ["<i class=\"highlight\">"],
        "post_tags" : ["</i>"],
        "fields" : {
            "title" : {},
            "sell_point" : {
              "pre_tags": "<em>",
              "post_tags": "</em>"
            }
        }
    }
}

Spring Data Elasticsearch API

配置文件

spring:
  elasticsearch:
    rest:
      uris:
      - http://192.168.64.181:9200
      - http://192.168.64.181:9201
      - http://192.168.64.181:9202

logging:
  level:
    # 通信数据日志
    tracer: TRACE

pojo类

@Document 注解

@Documnet注解对索引的参数进行设置。

下面代码中，把 students 索引的分片数设置为3，副本数设置为2。

@Id 注解

在 Elasticsearch 中创建文档时，使用 @Id 注解的字段作为文档的 _id 值

@Field 注解

通过 @Field 注解设置字段的数据类型和其他属性。

文本类型 text 和 keyword
text 类型会进行分词。

keyword 不会分词。

analyzer 指定分词器

通过 analyzer 设置可以指定分词器，例如 ik_smart、ik_max_word 等

package cn.tedu.esspringboot.es;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.DateFormat;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;

@Document(indexName = "students",shards = 3,replicas = 2)
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Student {
    @Id
    private Long id;

    @Field(analyzer = "ngram",type = FieldType.Text)
    private String name;

    @Field(type = FieldType.Keyword)
    private Character gender;

    @Field(type= FieldType.Date,format = DateFormat.custom,pattern = "yyyy-M-d")
    private String birthDate;
}

使用Repository接口

按照Repository规范对方法名命名就可以进行相应的查询

package cn.tedu.repo;

import cn.tedu.entity.Student;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;
import org.yaml.snakeyaml.events.Event;

import java.util.List;

/**
 * Repository接口，声明式接口，不需要自己写实现类
 * 访问ES服务器，需要继承 Spring Data ES 的 ElasticsearchRepository
 */
public interface StudentRepository extends ElasticsearchRepository<Student, Long> {
    // 按照Repository规范对方法名命名就可以进行相应的查询
    // 按照名字查询
    List<Student> findByName(String k);

    // 按名字或生日查询
    List<Student> findByNameOrBirthDate(String name, String birthDate);
}

创建索引

PUT /students
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2,
    "index.max_ngram_diff":30,
    "analysis": {
      "analyzer": {
        "ngram_analyzer": {
          "tokenizer": "ngram_tokenizer"
        }
      },
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 30,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "long"
      },
      "name": {
        "type": "text",
        "analyzer": "ngram_analyzer"
      },
      "gender": {
        "type": "keyword"
      },
      "birthDate": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

基础操作

package cn.tedu.es;

import cn.tedu.entity.Student;
import cn.tedu.repo.StudentRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.util.Optional;

@SpringBootTest
public class Test {
    @Autowired
    private StudentRepository studentRepository;

    // 添加数据，修改也是用save方法
    @org.junit.jupiter.api.Test
    public void test1() {
        Student s1 = new Student(9527L, "安妮", '女', "1997-10-19");
        Student s2 = new Student(9528L, "发条", '女', "1998-10-19");
        Student s3 = new Student(9529L, "龙王", '男', "1996-10-19");
        Student s4 = new Student(9526L, "龙女", '女', "1999-10-19");
        Student s5 = new Student(9525L, "卡牌", '男', "1995-10-19");
        Student s6 = new Student(9524L, "安妮", '女', "1994-10-19");
        Student s7 = new Student(9523L, "亚索", '男', "1993-10-19");
        studentRepository.save(s1);
        studentRepository.save(s2);
        studentRepository.save(s3);
        studentRepository.save(s4);
        studentRepository.save(s5);
        studentRepository.save(s6);
        studentRepository.save(s7);
    }

    //查询数据
    @org.junit.jupiter.api.Test
    public void test3() {
        // Optional // 避免空指针异常的工具类
        Optional<Student> op = studentRepository.findById(9527L);
        if (op.isPresent()) { // 检查内存包含的学生对象是否存在
            Student student = op.get();
            System.out.println(student);
        }
        System.out.println("-----------------------------");
        // 查询所有学生
        Iterable<Student> iterable = studentRepository.findAll();
        for (Student student : iterable) {
            System.out.println(student);
        }
    }

    // 删除数据
    @org.junit.jupiter.api.Test
    public void test4() {
        studentRepository.deleteById(9526L);
    }

}

对查询结果进行分页

分页工具

Pageble: 封装向服务器提交的分页参数，页号，每页大小
Page: （可选）封装服务器返回的搜索结果页面，这一页数据，分页参数属性

接口中的内容

    // 按名字或生日查询
    Page<Student> findByNameOrBirthDate(String name, String birthDate, Pageable pageable);

查询方法

    public void test5(){
        Pageable p = PageRequest.of(0, 2);// 0 表示第几页  2：表示每页几条
        Page<Student> list = studentRepository.findByNameOrBirthDate("龙", "1997-10-09", p);
        System.out.println(list);
        // 是否有下一页
        System.out.println(list.hasNext());
        // 是否有上一页
        System.out.println(list.hasPrevious());
        // 数据量
        System.out.println(list.getTotalElements());
        // 共有多少页
        System.out.println(list.getContent());
    }

对查询结果进行高亮显示

需要添加注解
在这里插入图片描述
对结果进行处理

使用Criteria封装

编辑类

package cn.tedu.repo;

import cn.tedu.entity.Student;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import org.springframework.data.elasticsearch.core.SearchHit;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.query.Criteria;
import org.springframework.data.elasticsearch.core.query.CriteriaQuery;
import org.springframework.stereotype.Component;

import java.util.List;
import java.util.stream.Collectors;

/**
 * criteria
 * 用来封装搜索条件
 * criteriaQuery
 * 封装上面的条件对象，分页参数
 * ElasticsearchOperation 操作对象
 */
@Component
public class StudentSearcher {
    @Autowired
    private ElasticsearchOperations operations;

    // 姓名中搜索关键词
    public List<Student> findByName(String k) {

        Criteria criteria = new Criteria("name");
        // 是否包含关键词
        criteria.is(k);
        return exec(criteria, null);
    }

    private List<Student> exec(Criteria criteria, Pageable pageable) {
        CriteriaQuery criteriaQuery = new CriteriaQuery(criteria);
        if (pageable != null) {
            criteriaQuery.setPageable(pageable);
        }
        /*
         *  searchHits是一个集合 里面包含student 对象
         */
        // 调整数据结构
        SearchHits<Student> shs = operations.search(criteriaQuery, Student.class);
        // 取出student对象放入list中
//        List<Student> list = new ArrayList<>();
//        for (SearchHit<Student> sh : shs){
//            list.add(sh.getContent());
//        }
        // 使用lambda表达式
        // 使用的是map映射
        List<Student> list = shs.stream().map(SearchHit::getContent).collect(Collectors.toList());
        return list;
    }

    // 出生日期范围查询
    public List<Student> findByBirthDate(String from, String to, Pageable pageable) {

        Criteria criteria = new Criteria("birthDate");
        criteria.between(from, to);
        return exec(criteria, pageable);
    }
}

测试

package cn.tedu.es;

import cn.tedu.entity.Student;
import cn.tedu.repo.StudentSearcher;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;

import java.util.List;
// ElasticSearchOperations 测试
public class Test2 {
    @Autowired
    private StudentSearcher studentSearcher;

    @Test
    public void test1(){
        List<Student> list = studentSearcher.findByName("龙");
        System.out.println(list);
    }

    @Test
    public void  test2(){

        Pageable p = PageRequest.of(0, 2);
        List<Student> birthDate = studentSearcher.findByBirthDate("1997-10-00", "1997-10-20", p);
        System.out.println(birthDate);
    }
}

批量插入数据

需将以特定的数据格式放入JSON文件中，然后在执行文件

数据格式

_index 表示指定的索引名
_id 表示指定的Id

{ "index": {"_index": "pditems", "_id": "536563"}}
{ "id":"536563","brand":"联想","title":"联想(Lenovo)小新Air13 Pro 13.3英寸14.8mm超轻薄笔记本电脑","sell_point":"清仓！仅北京，武汉仓有货！","price":"6688.0","barcode":"","image":"/images/server/images/portal/air13/little4.jpg","cid":"163","status":"1","created":"2015-03-08 21:33:18","updated":"2015-04-11 20:38:38"}

执行JSON文件

curl -XPOST 'localhost:9200/pditems/_bulk' \
    -H 'Content-Type:application/json' \
    --data-binary @pditems.json

H```

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch 相关知识

文章目录docker 搭建ES集群关闭防火墙设置底层系统参数准备网络和挂载目录启动3个集群查看结果查看节点浏览器插件docker 搭建ES集群关闭防火墙# 关闭防火墙systemctl stop firewalld.service# 禁用防火墙systemctl disable firewalld.service设置底层系统参数echo 'vm.max_map_count=262144' >>/etc/sysctl.conf准备网络和挂载目录# 创建虚拟网络d
复制链接

扫一扫