ElasticSearch快速入门看这篇就够了！

Husp0707

已于 2023-10-17 15:20:16 修改

阅读量781

点赞数

分类专栏：消息中间件文章标签： spring cloud

于 2023-03-19 16:45:05 首次发布

本文链接：https://blog.csdn.net/weixin_68021935/article/details/129651073

版权

消息中间件专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1、ES引言

Elasticsearch是位于Elastic Stack核心的分布式RESTful搜索和分析引擎。您可以使用Elasticsearch来存储、搜索和分析以下数据：日志、指标、搜索后端、应用程序监视、终端安全... ... 目前最新版本：8.6.2

Kibana使您能够轻松地向Elasticsearch发送请求，并交互式地分析、可视化和管理数据。

Apache Lucene是一个用Java编写的高性能（倒排索引）、全功能的文本搜索引擎库。

弹性堆栈(Elastic Stack)提供程序允许您使用terraform管理和配置弹性堆栈(Elasticsearch, Kibana等)作为代码。

倒排索引和正向索引概念区别

倒排索引：文档（document）和词条(term)。根据词找文档

ES和MySQL的区别

	ES	MySQL
结构	Index	Table
存储格式	Document	Row
字段	Field	Column
约束	Mapping	Schema
语句	DSL	SQL

2、ES安装

单点ES部署

让es和kibana容器互联，先创建一个网络

docker network create es-hu-net

考虑到es到dockerhub上通过docker pull拉取时间较长，在这里可以自取链接：

链接：https://pan.baidu.com/s/1hDBkrpFRn1z5y7Wjc06Vaw?pwd=f62o
提取码：f62o

上传es.tar和kibana.tar到虚拟机并运行

1. /usr/local  目录下新建目录es
cd  /usr/local/es 
2. 加载es.tar和
docker load -i es.tar
# elasticsearch:7.12.1  # es版本号    
3. 运行docker，部署单点es
docker run -d \  # 后台运行
	--name es \  # 给容器起名称
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \  # 堆内存
    -e "discovery.type=single-node" \  # 运行模式
    -v es-data:/usr/share/elasticsearch/data \  # 数据卷挂载
    -v es-plugins:/usr/share/elasticsearch/plugins \  # 数据卷挂载
    --privileged \  # 授予逻辑卷访问权
    --network es-hu-net \ # 将es名为es-hu-net的网络中
    -p 9200:9200 \ # http端口
    -p 9300:9300 \ # 容器端口
elasticsearch:7.12.1 # 镜像版本
4. 查看进程
docker ps

通过浏览器外部访问es

kibana部署

kibana在提供链接，不需要取拉取了。

链接：https://pan.baidu.com/s/11597LVpniTwb6Tg-AdfQbQ?pwd=xdm7
提取码：xdm7

1. /usr/local  目录下新建目录kibana
cd  /usr/local/kibana 
2. 加载kibana.tar
docker load -i kibana.tar
# Loaded image: kibana:7.12.1  
3. 运行docker，部署kibana
docker run -d \ # 后台运行
  --name kibana \ # 可视化工具起名字
  -e ELASTICSEARCH_HOSTS=http://es:9200 \ # 指定es地址
  --network=es-hu-net \ # 加入一个名为es-hu-net的网络中，与elasticsearch在同一个网络中
  -p 5601:5601  \ # 端口映射配置
kibana:7.12.1 # 镜像版本
4. 查看进程
docker ps
5. 查看启动日志
docker logs -f kibana

通过浏览器外部访问kibana关联es

编写DSL语句发送rest请求

3、分词器

分词器就是在创建倒排索引时需要对文档分词、语义分析

IK分词器处理中文分词

https://github.com/medcl/elasticsearch-analysis-ik

ik分词器在这里提供了链接解压后上传linux

链接：https://pan.baidu.com/s/1MDnxVmISDqpkl5yWbpOZFg?pwd=ot6j
提取码：ot6j

安装ik分词器

1. 查看elasticsearch的数据卷目录
docker volume inspect es-plugins    
cd /var/lib/docker/volumes/es-plugins/_data
解压ik分词器
# tar -xvf elasticsearch-analysis-ik-7.12.1.tar 
2. 上传到linux
直接拖拽 
3. 重启容器
docker restart es 
4. 查看es日志
docker logs -f es

分词的模式：ik_smart , ik_max_word

# ik分词器 ik_smart最少切分分词模式
POST /_analyze
{
  "text": "合肥师范学院计算机学院",  
  "analyzer": "ik_smart"
}

ik_smart最少切分分词模式分词的结果

{
  "tokens" : [
    {
      "token" : "合肥",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "师范学院",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "计算机",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "学院",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

# ik分词器 ik_max_word最细切分分词模式
POST /_analyze
{
  "text": "合肥师范学院计算机学院",  
  "analyzer": "ik_max_word"
}

ik_max_word最细切分分词模式分词结果

{
  "tokens" : [
    {
      "token" : "合肥",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "师范学院",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "师范",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "学院",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "计算机",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "计算",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "算机",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "学院",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 7
    }
  ]
}

分词的模式：ik_smart , ik_max_word

基于内存占用、搜索效率、搜索概率方面，适当选择分词模式

4、分词器字典

随着互联网的发展，“造词运动”也越发的频繁。出现了很多新的词语，在原有的词汇列表中并不存在。分词的字典需要过滤无意义词，敏感词禁用

IKAnalyzer.cfg.xml配置文件

cd /var/lib/docker/volumes/es-plugins/_data/ik-7.12.1/config

vi IKAnalyzer.cfg.xml

docker restart es  # 重启docker容器
docker logs -f es  # 持续跟踪日志

扩展分词字典测试

# 分词字典
POST /_analyze
{
  "text": "合肥师范学院计算机学院的2022届毕业生，入行IT程序员事业难了，生活压迫躺平么，迷茫就摆烂么, IT开发哥们慢慢内卷，一份工作不能摆烂了，摆烂就芭比Q了",  
  "analyzer": "ik_smart"
}

查询结果

{
  "tokens" : [
    {
      "token" : "合肥",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "师范学院",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "计算机",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "学院",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "2022届",
      "start_offset" : 12,
      "end_offset" : 17,
      "type" : "TYPE_CQUAN",
      "position" : 4
    },
    {
      "token" : "毕业生",
      "start_offset" : 17,
      "end_offset" : 20,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "入行",
      "start_offset" : 21,
      "end_offset" : 23,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "程序员",
      "start_offset" : 25,
      "end_offset" : 28,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "事业",
      "start_offset" : 28,
      "end_offset" : 30,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "难了",
      "start_offset" : 30,
      "end_offset" : 32,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "生活",
      "start_offset" : 33,
      "end_offset" : 35,
      "type" : "CN_WORD",
      "position" : 10
    },
    {
      "token" : "压迫",
      "start_offset" : 35,
      "end_offset" : 37,
      "type" : "CN_WORD",
      "position" : 11
    },
    {
      "token" : "躺平",
      "start_offset" : 37,
      "end_offset" : 39,
      "type" : "CN_WORD",
      "position" : 12
    },
    {
      "token" : "迷茫",
      "start_offset" : 41,
      "end_offset" : 43,
      "type" : "CN_WORD",
      "position" : 13
    },
    {
      "token" : "就",
      "start_offset" : 43,
      "end_offset" : 44,
      "type" : "CN_CHAR",
      "position" : 14
    },
    {
      "token" : "摆烂",
      "start_offset" : 44,
      "end_offset" : 46,
      "type" : "CN_WORD",
      "position" : 15
    },
    {
      "token" : "开发",
      "start_offset" : 51,
      "end_offset" : 53,
      "type" : "CN_WORD",
      "position" : 16
    },
    {
      "token" : "哥们",
      "start_offset" : 53,
      "end_offset" : 55,
      "type" : "CN_WORD",
      "position" : 17
    },
    {
      "token" : "慢慢",
      "start_offset" : 55,
      "end_offset" : 57,
      "type" : "CN_WORD",
      "position" : 18
    },
    {
      "token" : "内卷",
      "start_offset" : 57,
      "end_offset" : 59,
      "type" : "CN_WORD",
      "position" : 19
    },
    {
      "token" : "一份",
      "start_offset" : 60,
      "end_offset" : 62,
      "type" : "CN_WORD",
      "position" : 20
    },
    {
      "token" : "工作",
      "start_offset" : 62,
      "end_offset" : 64,
      "type" : "CN_WORD",
      "position" : 21
    },
    {
      "token" : "不能",
      "start_offset" : 64,
      "end_offset" : 66,
      "type" : "CN_WORD",
      "position" : 22
    },
    {
      "token" : "摆",
      "start_offset" : 66,
      "end_offset" : 67,
      "type" : "CN_CHAR",
      "position" : 23
    },
    {
      "token" : "烂了",
      "start_offset" : 67,
      "end_offset" : 69,
      "type" : "CN_WORD",
      "position" : 24
    },
    {
      "token" : "摆烂",
      "start_offset" : 70,
      "end_offset" : 72,
      "type" : "CN_WORD",
      "position" : 25
    },
    {
      "token" : "就",
      "start_offset" : 72,
      "end_offset" : 73,
      "type" : "CN_CHAR",
      "position" : 26
    },
    {
      "token" : "芭比",
      "start_offset" : 73,
      "end_offset" : 75,
      "type" : "CN_WORD",
      "position" : 27
    },
    {
      "token" : "q",
      "start_offset" : 75,
      "end_offset" : 76,
      "type" : "ENGLISH",
      "position" : 28
    }
  ]
}