Elasticseach入门

最新推荐文章于 2022-06-18 12:46:53 发布

浪里白条^

最新推荐文章于 2022-06-18 12:46:53 发布

阅读量288

点赞数

分类专栏：全文检索文章标签： java elasticsearch

本文链接：https://blog.csdn.net/weixin_42513762/article/details/109308341

版权

全文检索专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章内容输出来源：拉勾教育Java高薪训练营

Elasticseach

1.Elasticseach Single-Node Mode快速部署

Elasticsearch是一个分布式全文搜索引擎，支持单节点模式(Single-Node Mode)和集群模式(Cluster
Mode)部署，一般来说，小公司的业务场景往往使用Single-Node Mode部署即可。课程中我们先以
Single-Node Mode部署实例学习，随后再专门讲授集群模式相关内容。

虚拟机环境准备
准备一台虚拟机
操作系统：CentOS 7.x 64 bit
客户端连接工具：XShell
关闭虚拟机的防火墙

systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动
firewall-cmd --state # 查看防火墙

Elasticsearch Single-Node Mode部署
我们在虚拟机上部署Single-Node Mode Elasticsearch

下载Elasticsearch
地址： https://www.elastic.co/cn/downloads/elasticsearch 最新版本
下载: https://www.elastic.co/cn/downloads/past-releases/elasticsearch-7.3…0 版本

选择Linux版本下载：

在这里插入图片描述

开始安装JDK

解压三个tar.gz文件

tar -zxvf jdk-8u171-linux-x64.tar.gz
tar -zxvf elasticsearch-7.3.0-linux-x86_64.tar.gz

移动文件到安装目录

mv /root/jdk1.8.0_171 /usr/java/
mv /root/elasticsearch-7.3.0 /usr/elasticsearch/

配置jdk环境变量

vim /etc/profile

JAVA_HOME=/usr/java
JRE_HOME=/usr/java/jre
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export JAVA_HOME JRE_HOME CLASS_PATH PATH

配置修改生效

source /etc/profile

检查jdk

java -version

配置Elasticsearch

编辑vim /usr/elasticsearch/config/elasticsearch.yml ，注意冒号后面有个空格。
```
vim /usr/elasticsearch/config/elasticsearch.yml
```
单机安装请取消注释：node.name: node-1，否则无法正常启动。
修改网络和端口，取消注释master节点，单机只保留一个node
```
node.name: node-1
network.host: 192.168.211.136
# #
Set a custom port for HTTP:
# h
ttp.port: 9200
cluster.initial_master_nodes: ["node-1"]
```
按需修改vim /usr/elasticsearch/config/jvm.options内存设置
```
vim /usr/elasticsearch/config/jvm.options
```
根据实际情况修改占用内存，默认都是1G，单机1G内存，启动会占用700m+然后在安装kibana后，基本上无法运行了，运行了一会就挂了报内存不足。内存设置超出物理内存，也会无法启动，启动报错。
```
-Xms1g
-Xmx1g
```
添加es用户，es默认root用户无法启动，需要改为其他用户
```
useradd esuser
修改密码
passwd esuser
```
改变es目录拥有者账号
```
chown -R esuser /usr/elasticsearch/
```
修改/etc/sysctl.conf
```
vim /etc/sysctl.conf
```
末尾添加：vm.max_map_count=655360

执行sysctl -p 让其生效
```
sysctl -p
```

修改/etc/security/limits.conf

vim /etc/security/limits.conf

末尾添加：

* soft nofile 65536 
* hard nofile 65536 
* soft nproc 4096 
* hard nproc 4096

启动es
切换刚刚新建的用户

su esuser

启动命令

/usr/elasticsearch/bin/elasticsearch

访问测试

配置完成：浏览器访问node1测试。

2.安装Kibana

下载7.3.0版本的Kibana并上传到服务器：https://artifacts.elastic.co/downloads/kibana/kibana-7.3.0-linux-x86_64.tar.gz

解压到/usr/local/kibana目录

tar zxvf kibana-7.3.0-linux-x86_64.tar.gz -C /usr/local/kibana/

将/usr/local/kibana目录的拥有者改为esuser账号

chown -R esuser /usr/local/kibana/

设置访问权限

chmod -R 777 /usr/local/kibana

修改配置文件

vim /usr/local/kibana/config/kibana.yml

修改以下几项：

server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://192.168.10.154:9200"]

启动测试

切换为esuser账号，启动Kibana。

cd /usr/local/kibana
su esuser
./bin/kibana

浏览器访问http://192.168.31.199:5601即可以看到结果。

截屏2020-09-2020.00.30

3.Elasticsearch集成IK分词器

1.集成IK分词器

IKAnalyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版
开始，IKAnalyzer已经推出了3个大版本。最初，它是以开源项目Lucene为应用主体的，结合词典分词
和文法分析算法的中文分词组件。新版本的IKAnalyzer3.0则发展为面向Java的公用分词组件，独立于
Lucene项目，同时提供了对Lucene的默认优化实现。
IK分词器3.0的特性如下：
1）采用了特有的“正向迭代最细粒度切分算法“，具有60万字/秒的高速处理能力。
2）采用了多子处理器分析模式，支持：英文字母（IP地址、Email、URL）、数字（日期，常用中文数
量词，罗马数字，科学计数法），中文词汇（姓名、地名处理）等分词处理。
3）支持个人词条的优化的词典存储，更小的内存占用。
4）支持用户词典扩展定义。
5）针对Lucene全文检索优化的查询分析器IKQueryParser；采用歧义分析算法优化查询关键字的搜索
排列组合，能极大的提高Lucene检索的命中率。

方式一

1）在elasticsearch的bin目录下执行以下命令,es插件管理器会自动帮我们安装，然后等待安装完成：

/usr/elasticsearch/bin/elasticsearch-plugin install
https://github.com/medcl/elasticsearch-analysisik/releases/download/v7.3.0/elasticsearch-analysis-ik-7.3.0.zip

2）下载完成后会提示 Continue with installation?输入 y 即可完成安装
3）重启Elasticsearch 和Kibana

方式二

下载7.3.0版本的Elasticsearch的IK分词器的安装包：https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.3.0/elasticsearch-analysis-ik-7.3.0.zip

在elasticsearch安装目录的plugins目录下新建 analysis-ik 目录

mkdir -p /usr/local/elasticsearch/plugins/analysis-ik

解压分词器到新建的目录

unzip elasticsearch-analysis-ik-7.3.0.zip -d /usr/local/elasticsearch/plugins/analysis-ik/

重新启动Elasticsearch和Kibana

测试案例

IK分词器有两种分词模式：ik_max_word和ik_smart模式。
1）ik_max_word (常用)
会将文本做最细粒度的拆分

2）ik_smart
会做最粗粒度的拆分
大家先不管语法，我们先在Kibana测试一波输入下面的请求：

POST _analyze
{
"analyzer": "ik_max_word",
"text": "南京市长江大桥"
}

POST _analyze
{
"analyzer": "ik_smart",
"text": "南京市长江大桥"
}

2 扩展词典使用

扩展词：就是不想让哪些词被分开，让他们分成一个词。比如上面的江大桥
自定义扩展词库
1）进入到 config/analysis-ik/(插件命令安装方式) 或 plugins/analysis-ik/config(安装包安装方式) 目录下, 新增自定义词典

vim lagou_ext_dict.dic

输入：江大桥
2）将我们自定义的扩展词典文件添加到IKAnalyzer.cfg.xml配置中
vim IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">lagou_ext_dict.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">lagou_stop_dict.dic</entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry
key="remote_ext_dict">http://192.168.211.130:8080/tag.dic</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

3）重启Elasticsearch

3 停用词典使用

停用词：有些词在文本中出现的频率非常高。但对本文的语义产生不了多大的影响。例如英文的a、
an、the、of等。或中文的”的、了、呢等”。这样的词称为停用词。停用词经常被过滤掉，不会被进行索引。在检索的过程中，如果用户的查询词中含有停用词，系统会自动过滤掉。停用词可以加快索引的速度，减少索引库文件的大小。
自定义停用词库
1）进入到 config/analysis-ik/(插件命令安装方式) 或 plugins/analysis-ik/config(安装包安装方式) 目录下, 新增自定义词典

vim lagou_stop_dict.dic

输入

的 
了 
啊

2）将我们自定义的停用词典文件添加到IKAnalyzer.cfg.xml配置中
3）重启Elasticsearch

4 同义词典使用

语言博大精深，有很多相同意思的词，我们称之为同义词，比如“番茄”和“西红柿”，“馒头”和“馍”等。在搜索的时候，我们输入的可能是“番茄”，但是应该把含有“西红柿”的数据一起查询出来，这种情况叫做同义词查询。
注意：扩展词和停用词是在索引的时候使用，而同义词是检索时候使用。
配置IK同义词
Elasticsearch 自带一个名为 synonym 的同义词 filter。为了能让 IK 和 synonym 同时工作，我们需要定义新的 analyzer，用 IK 做 tokenizer，synonym 做 filter。听上去很复杂，实际上要做的只是加一段配置。
1）创建/config/analysis-ik/synonym.txt 文件，输入一些同义词并存为 utf-8 格式。例如

lagou,拉勾
china,中国

1）创建索引时，使用同义词配置，示例模板如下

PUT /索引名称

{
    "mappings": {
        "properties": {
            "字段名": {
                "analyzer": "ik_sync_smart",
                "search_analyzer": "ik_sync_smart",
                "type": "字段类型"
            }
        }
    },
    "settings": {
        "analysis": {
            "analyzer": {
                "ik_sync_max_word": {
                    "filter": [
                        "word_sync"
                    ],
                    "tokenizer": "ik_max_word",
                    "type": "custom"
                },
                "ik_sync_smart": {
                    "filter": [
                        "word_sync"
                    ],
                    "tokenizer": "ik_smart",
                    "type": "custom"
                }
            },
            "filter": {
                "word_sync": {
                    "synonyms_path": "analysis-ik/synonym.txt",
                    "type": "synonym"
                }
            }
        }
    }
}

以上配置定义了ik_sync_max_word和ik_sync_smart这两个新的 analyzer，对应 IK 的 ik_max_word 和ik_smart 两种分词策略。ik_sync_max_word和 ik_sync_smart都会使用 synonym filter 实现同义词转换

3）到此，索引创建模板中同义词配置完成，搜索时指定分词为ik_sync_max_wordik_sync_smart。
4）案例

PUT /lagou-es-synonym
{
    "mappings": {
        "properties": {
            "name": {
                "analyzer": "ik_sync_max_word",
                "search_analyzer": "ik_sync_max_word",
                "type": "text"
            }
        }
    },
    "settings": {
        "analysis": {
            "analyzer": {
                "ik_sync_max_word": {
                    "filter": [
                        "word_sync"
                    ],
                    "tokenizer": "ik_max_word",
                    "type": "custom"
                },
                "ik_sync_smart": {
                    "filter": [
                        "word_sync"
                    ],
                    "tokenizer": "ik_smart",
                    "type": "custom"
                }
            },
            "filter": {
                "word_sync": {
                    "synonyms_path": "analysis-ik/synonym.txt",
                    "type": "synonym"
                }
            }
        }
    }
}

插入数据

POST /lagou-es-synonym/_doc/1
{
"name":"拉勾是中国专业的互联网招聘平台"
}

使用同义词"lagou"或者“china”进行搜索

POST /lagou-es-synonym/_doc/_search
{
    "query": {
        "match": {
            "name": "lagou"
        }
    }
}

4 索引操作（创建、查看、删除）

1.创建索引库

Elasticsearch采用Rest风格API，因此其API就是一次http请求，你可以用任何工具发起http请求

语法

PUT /索引名称
{
    "settings": {
    	"属性名": "属性值"
    }
}

settings：就是索引库设置，其中可以定义索引库的各种属性比如分片数副本数等，目前我们可以不
设置，都走默认

示例
PUT /lagou-company-index

2. 判断索引是否存在

语法

HEAD /索引名称

示例 HEAD /lagou-company-index

3.查看索引

Get请求可以帮我们查看索引的相关属性信息，格式：

查看单个索引

语法

GET /索引名称

示例 GET /lagou-company-index

在这里插入图片描述

批量查看索引

语法

GET /索引名称1,索引名称2,索引名称3,...

示例 GET /lagou-company-index,lagou-employee-index

在这里插入图片描述

查看所有索引

方式一

GET _all

在这里插入图片描述

方式二

GET /_cat/indices?v

在这里插入图片描述

绿色：索引的所有分片都正常分配。
黄色：至少有一个副本没有得到正确的分配。
红色：至少有一个主分片没有得到正确的分配。

4. 打开索引

语法

POST /索引名称/_open

在这里插入图片描述

5.关闭索引

语法

POST /索引名称/_close

在这里插入图片描述

6. 删除索引库

删除索引使用DELETE请求

语法

DELETE /索引名称1,索引名称2,索引名称3...

示例

在这里插入图片描述

再次查看，返回索引不存在

在这里插入图片描述

5 映射操作

索引创建之后，等于有了关系型数据库中的database。Elasticsearch7.x取消了索引type类型的设置，不允许指定类型，默认为_doc，但字段仍然是有的，我们需要设置字段的约束信息，叫做字段映射（mapping）
字段的约束包括但不限于：

字段的数据类型
是否要存储
是否要索引
分词器

我们一起来看下创建的语法。

1.创建映射字段

语法

数据导入MySQL

使用图形界面工具将position.sql导入到192.168.31.199:3306上的数据库内。

截屏2020-09-0821.13.10

查询实现

新建一个springboot-es项目。

Maven依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.3.3.RELEASE</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.lagou.edu</groupId>
	<artifactId>springboot_es</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>springboot_es</name>
	<description>Elasticsearch project for Spring Boot</description>

	<properties>
		<java.version>11</java.version>
		<elasticsearch.version>7.3.0</elasticsearch.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-thymeleaf</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>org.elasticsearch.client</groupId>
			<artifactId>elasticsearch-rest-high-level-client</artifactId>
			<version>${elasticsearch.version}</version>
		</dependency>
		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>${elasticsearch.version}</version>
		</dependency>
		<dependency>
			<groupId>org.elasticsearch.client</groupId>
			<artifactId>elasticsearch-rest-high-level-client</artifactId>
			<version>${elasticsearch.version}</version>
			<exclusions>
				<exclusion>
					<groupId>org.elasticsearch</groupId>
					<artifactId>elasticsearch</artifactId>
				</exclusion>
			</exclusions>
		</dependency>
		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>${elasticsearch.version}</version>
		</dependency>
		<!-- HttpClient -->
		<dependency>
			<groupId>org.apache.httpcomponents</groupId>
			<artifactId>httpclient</artifactId>
			<version>4.5.3</version>
		</dependency>
		<dependency>
			<groupId>org.apache.commons</groupId>
			<artifactId>commons-lang3</artifactId>
			<version>3.9</version>
		</dependency>
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-devtools</artifactId>
			<scope>runtime</scope>
			<optional>true</optional>
		</dependency>
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
			<optional>true</optional>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
			<exclusions>
				<exclusion>
					<groupId>org.junit.vintage</groupId>
					<artifactId>junit-vintage-engine</artifactId>
				</exclusion>
			</exclusions>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>

</project>

配置文件

主要是ES配置。

spring:
  devtools:
    restart:
        enabled: true #设置开启热部署
        additional-paths: src/main/java #重启目录 
        exclude: WEB-INF/**
    freemarker:
       cache: false #页面不加载缓存，修改即时生效
  elasticsearch:
    rest:
        uris: 192.168.31.199:9200
  logging:
    level:
        root: info
        com.xdclass.search: debug
  thymeleaf:
    cache: false

PositionServiceImpl类

编写PositionService实现类，实现查询方法。

/**
     * 分页查询
     * @param keyword
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
@Override
public List<Map<String, Object>> searchPos(String keyword, int pageNo, int pageSize) throws IOException {
  if (pageNo <= 1) {
    pageNo = 1;
  }

  SearchRequest searchRequest = new SearchRequest(POSITIOIN_INDEX);
  SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
  //分页 index = (当前页-1)*一页显示条数
  searchSourceBuilder.from((pageNo - 1) * pageSize);
  searchSourceBuilder.size(pageSize);
  //精准匹配
  QueryBuilder builder = QueryBuilders.matchQuery("positionName", keyword);
  searchSourceBuilder.query(builder);
  searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
  //执行搜索
  searchRequest.source(searchSourceBuilder);
  SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
  ArrayList<Map<String, Object>> list = new ArrayList<>();
  SearchHit[] hits = searchResponse.getHits().getHits();
  System.out.println(hits.length);
  for (SearchHit hit : hits) {
    list.add(hit.getSourceAsMap());
  }

  // 不够5条，就补充
  if (list.size() < 5) {
    QueryBuilder supBuilder = QueryBuilders.matchQuery("positionAdvantage", "美女多、员工福利好");
    searchSourceBuilder.query(supBuilder);
    searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    //执行搜索
    searchRequest.source(searchSourceBuilder);
    SearchResponse supSearchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    ArrayList<Map<String, Object>> supList = new ArrayList<>();
    SearchHit[] supHits = supSearchResponse.getHits().getHits();
    for (SearchHit hit : supHits) {
      if (list.size() >= 5) {
        break;
      }
      Map<String, Object> sourceAsMap = hit.getSourceAsMap();
      // 避免添加重复的职位
      if (!list.contains(sourceAsMap)) {
        list.add(sourceAsMap);
      }
    }
  }

  return list;
}

查询页面

在前面作业的基础上修改，改为使用vue请求后端接口，不刷新页面的情况下返回结果。

<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
    <style>
        .position-item {
            width: 200px;
            height: 122px;
            margin: 10px;
            padding: 5px;
            border: cornflowerblue 1px solid;
            float: left;
            list-style:none;
        }
    </style>
    <script src="https://cdn.jsdelivr.net/npm/vue/dist/vue.js"></script>
    <script src="https://unpkg.com/axios/dist/axios.min.js"></script>
</head>
<body>
<div id="app">
    <input name="positionName" v-model="keyword" @keyup.enter="searchPosition">
    <button @click="searchPosition">搜索</button>
    <h1>搜索结果</h1>
    <ul>
        <li v-for="item in results" class="position-item">
            <div>
                <span>{{item.positionName}}</span>
                <span>{{item.salary}}</span>
            </div>
            <div>
                <span>{{item.workYear}}</span>
                <span>{{item.education}}</span>
            </div>
            <div>
                <span>{{item.positionAdvantage}}</span>
            </div>
            <div>
                <span>{{item.companyName}}</span>
            </div>
            <div>
                <span>{{item.workAddress}}</span>
            </div>
        </li>
    </ul>
</div>
</body>
<script>
    new Vue({
        el: "#app",
        data: {
            keyword : "", //搜索关键字
            results : [] //查询结果
        },
        methods: {
            searchPosition(){
                var keyword = this.keyword;
                console.log(keyword);
                axios.get('/search/' + keyword + '/1/10').then(response=>{
                    console.log(response.data);
                    this.results = response.data;
                });
            }
        }
    });
</script>
</html>