文章目录
序号 | 内容 | 链接地址 |
---|---|---|
1 | SpringBoot整合Elasticsearch7.6.1 | https://blog.csdn.net/miaomiao19971215/article/details/105106783 |
2 | Elasticsearch Filter执行原理 | https://blog.csdn.net/miaomiao19971215/article/details/105487446 |
3 | Elasticsearch 倒排索引与重建索引 | https://blog.csdn.net/miaomiao19971215/article/details/105487532 |
4 | Elasticsearch Document写入原理 | https://blog.csdn.net/miaomiao19971215/article/details/105487574 |
5 | Elasticsearch 相关度评分算法 | https://blog.csdn.net/miaomiao19971215/article/details/105487656 |
6 | Elasticsearch Doc values | https://blog.csdn.net/miaomiao19971215/article/details/105487676 |
7 | Elasticsearch 搜索技术深入 | https://blog.csdn.net/miaomiao19971215/article/details/105487711 |
8 | Elasticsearch 聚合搜索技术深入 | https://blog.csdn.net/miaomiao19971215/article/details/105487885 |
9 | Elasticsearch 内存使用 | https://blog.csdn.net/miaomiao19971215/article/details/105605379 |
10 | Elasticsearch ES-Document数据建模详解 | https://blog.csdn.net/miaomiao19971215/article/details/105720737 |
一. 概述
本文记录了Spring Boot与Elasticsearch的整合方式,Spring boot的版本为2.1.9.RELEASE,Elasticsearch的版本为7.6.1。
参考: 官网地址
如果需要本文项目源代码,就评论留言吧
二. 集成
2.1 maven中添加依赖
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- 为了方便后续代码的编写,引入了lombok lombok与整合无关,完全可以不依赖 -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
<scope>provided</scope>
<version>1.18.8</version>
</dependency>
spring-boot-starter-parent通过声明成parent继承,或是dependencyManagement中使用pom的方式引入都可以。
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.9.RELEASE</version>
<relativePath/>
</parent>
<dependencyManagement>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.9.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</denpendencyManagement>
在ES客户端启动时,必须依赖log4j才能运行,由于我们继承了spring-boot-starter-parent,这个父类中已经包含了log4j2,因此不需要显示的写出来。(反之,如果在普通的Spring项目中集成ES,那么就需要显示的依赖log4j)
注意: 本文连接ES的客户端使用的是REST clients,因此后续的配置和api调用都围绕着REST clients,实际上,官网还提供了Jest用于连接ES,访问页面后搜索关键字"Connecting to Elasticsearch by Using Jest"。
2.2 配置文件
yaml,properties,配置类,三种方式选其一。
2.3.1 yaml
server:
port: 8080
spring:
application:
name: spring-boot-es-demo
elasticsearch:
rest:
username: user
password: 123456
uris: https://127.0.0.1:9200
2.3.2 properties
server.port=8080
spring.application.name=spring-boot-es-demo
spring.elasticsearch.rest.username=user
spring.elasticsearch.rest.password=123456
spring.elasticsearch.rest.uris=https://127.0.0.1:9200
2.3.3 配置类
- 自定义的配置 (需要写在yaml或者properties中)
#============================================================================
# Elasticsearch-核心配置
#============================================================================
# http连接超时时间
elasticsearch.connectTimeout=1000
# socket连接超时时间
elasticsearch.socketTimeout=30000
# 获取连接的超时时间
elasticsearch.connectionRequestTimeout=500
# 最大连接数
elasticsearch.maxConnTotal=100
# 最大路由连接数
elasticsearch.maxConnPerRoute=100
# 任务最长可执行时间 (单位:小时)
elasticsearch.executeTimeout=8
# 用户名
elasticsearch.username=admin
# 密码
elasticsearch.password=123456
- ESProperties用于与ES相关配置进行映射
(PS: 你完全可以不使用本类,配置写在Disconf或者Apollo,自己写配置中心获取配置)
@Getter
@Setter
@ConfigurationProperties(prefix = "elasticsearch")
@Configuration
public class ESProperties {
/**
* http连接超时时间
*/
private String connectTimeout;
/**
* socket连接超时时间
*/
private String socketTimeout;
/**
* 获取连接的超时时间
*/
private String connectionRequestTimeout;
/**
* 最大连接数
*/
private String maxConnTotal;
/**
* 最大路由连接数
*/
private String maxConnPerRoute;
/**
* 用户名
*/
private String username;
/**
* 密码
*/
private String password;
/**
* Elasticsearch http访问路径
*/
private String httpHost;
}
- Elasticsearch 配置类
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
@Configuration
public class ElasticsearchConfig {
private final ESProperties esProperties;
@Bean
public RestHighLevelClient clientDev() {
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(
esProperties.getUsername(), esProperties.getPassword()
));
// 初始化ES客户端的构造器
RestClientBuilder builder = RestClient.builder(httpHostHandlerDev());
// 异步的请求配置
builder.setRequestConfigCallback(builder1 -> {
// 连接超时时间 默认-1
builder1.setConnectTimeout(Integer.parseInt(esProperties.getConnectTimeout()));
//
builder1.setSocketTimeout(Integer.parseInt(esProperties.getSocketTimeout()));
// 获取连接的超时时间 默认-1
builder1.setConnectionRequestTimeout(Integer.parseInt(esProperties.getConnectionRequestTimeout()));
return builder1;
});
// 异步的httpclient连接数配置
builder.setHttpClientConfigCallback(httpAsyncClientBuilder -> {
// 最大连接数
httpAsyncClientBuilder.setMaxConnTotal(Integer.parseInt(esProperties.getMaxConnTotal()));
// 最大路由连接数
httpAsyncClientBuilder.setMaxConnPerRoute(Integer.parseInt(esProperties.getMaxConnPerRoute()));
// 赋予连接凭证
httpAsyncClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
return httpAsyncClientBuilder;
});
return new RestHighLevelClient(builder);
}
/**
* 为了应对集群部署的es,使用以下写法,返回HttpHost数组
*/
private HttpHost[] httpHostHandlerDev() {
String[] hosts = esProperties.getHttpHost().split(",");
HttpHost[] httpHosts = new HttpHost[hosts.length];
for (int i = 0; i < hosts.length; i++) {
String ip = hosts[i].split(":")[0];
int port = Integer.parseInt(hosts[i].split(":")[1]);
httpHosts[i] = new HttpHost(ip, port, "http");
}
return httpHosts;
}
}
三. Api调用
3.1 查看索引是否存在
public boolean existIndex(String indexName) throws IOException {
GetIndexRequest request = new GetIndexRequest(indexName);
return esClient.indices().exists(request, RequestOptions.DEFAULT);
}
3.2 创建索引
public void createIndex(String indexName, int numberOfShards, int numberOfReplicas) throws IOException {
if (!existIndex(indexName)) {
CreateIndexRequest request = new CreateIndexRequest(indexName);
// settings部分
request.settings(Settings.builder()
// 创建索引时,分配的主分片的数量
.put("index.number_of_shards", numberOfReplicas)
// 创建索引时,为每一个主分片分配的副本分片的数量
.put("index.number_of_replicas", numberOfReplicas)
);
// mapping部分 除了用json字符串来定义外,还可以使用Map或者XContentBuilder
request.mapping("{\n" +
" \"properties\": {\n" +
" \"message\": {\n" +
" \"type\": \"text\"\n" +
" }\n" +
" }\n" +
"}", XContentType.JSON);
// 创建索引(同步的方式)
// CreateIndexResponse response = esClient.indices().create(request, RequestOptions.DEFAULT);
// 创建索引(异步的方式)
esClient.indices().createAsync(request, RequestOptions.DEFAULT, new ActionListener<CreateIndexResponse>() {
@Override
public void onResponse(CreateIndexResponse createIndexResponse) {
log.debug("执行情况:" + createIndexResponse);
}
@Override
public void onFailure(Exception e) {
log.error("执行失败的原因:" + e.getMessage()) ;
}
});
}
}
3.3 更新索引的settings配置
public void updateIndexSettings(String indexName) throws IOException {
UpdateSettingsRequest request = new UpdateSettingsRequest(indexName);
String settingKey = "index.number_of_replicas";
int settingValue = 2;
Settings.Builder settingsBuilder = Settings.builder().put(settingKey, settingValue);
request.settings(settingsBuilder);
// 是否更新已经存在的settings配置 默认false
request.setPreserveExisting(true);
// 更新settings配置(同步)
//esClient.indices().putSettings(request, RequestOptions.DEFAULT);
// 更新settings配置(异步)
esClient.indices().putSettingsAsync(request, RequestOptions.DEFAULT, new ActionListener<AcknowledgedResponse>() {
@Override
public void onResponse(AcknowledgedResponse acknowledgedResponse) {
log.debug("执行情况:" + acknowledgedResponse);
}
@Override
public void onFailure(Exception e) {
log.error("执行失败的原因:" + e.getMessage()) ;
}
});
}
3.4 更新索引的mapping配置
public void putIndexMapping(String indexName) throws IOException {
PutMappingRequest request = new PutMappingRequest(indexName);
XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
builder.startObject("properties");
{
builder.startObject("new_parameter");
{
builder.field("type", "text");
builder.field("analyzer", "ik_max_word");
}
builder.endObject();
}
builder.endObject();
}
builder.endObject();
request.source(builder);
// 新增mapping配置(同步)
//AcknowledgedResponse putMappingResponse = esClient.indices().putMapping(request, RequestOptions.DEFAULT);
// 新增mapping配置(异步)
esClient.indices().putMappingAsync(request, RequestOptions.DEFAULT, new ActionListener<AcknowledgedResponse>() {
@Override
public void onResponse(AcknowledgedResponse acknowledgedResponse) {
log.debug("执行情况:" + acknowledgedResponse);
}
@Override
public void onFailure(Exception e) {
log.error("执行失败的原因:" + e.getMessage()) ;
}
});
}
3.5 新增Document
使用json字符串
public void addDocument1(String indexName) throws IOException {
IndexRequest request = new IndexRequest(indexName);
request.id("1");
String jsonString = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2020-03-28\"," +
"\"message\":\"trying out Elasticsearch\"" +
"}";
request.source(jsonString, XContentType.JSON);
request.routing("routing");
esClient.index(request, RequestOptions.DEFAULT);
}
使用Map
public void addDocument2(String indexName) throws IOException{
Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("user", "kimchy");
jsonMap.put("postDate", new Date());
jsonMap.put("message", "trying out Elasticsearch");
IndexRequest indexRequest = new IndexRequest(indexName).id("1").source(jsonMap);
indexRequest.routing("routing");
esClient.indexAsync(indexRequest, RequestOptions.DEFAULT, new ActionListener<IndexResponse>() {
@Override
public void onResponse(IndexResponse indexResponse) {
log.debug("执行情况: " + indexResponse);
}
@Override
public void onFailure(Exception e) {
log.error("执行失败的原因");
}
});
}
3.6 修改Document
public void updateDocument(String indexName) throws IOException{
// 传入索引名称和需要更新的Document的id
UpdateRequest request = new UpdateRequest(indexName, "1");
// 更新的内容会与数据本身合并, 若存在则更新,不存在则新增
// 组装更新内容的数据结构有四种: json字符串、Map、XContentBuilder、Key-Value
// json字符串
// String jsonString = "{" +
// "\"updated\":\"2020-03-29\"," +
// "\"reason\":\"daily update\"" +
// "}";
// request.doc(jsonString);
// Map
// Map<String, Object> jsonMap = new HashMap<>();
// jsonMap.put("updated", new Date());
// jsonMap.put("reason", "daily update");
// request.doc(jsonMap);
// XContentBuilder
// XContentBuilder builder = XContentFactory.jsonBuilder();
// builder.startObject();
// builder.timeField("updated", new Date());
// builder.timeField("reason", "daily update");
// builder.endObject();
// request.doc(builder);
// Key-Value
request.doc("updated", new Date(),"reason", "daily update");
// 同步的方式发送更新请求
esClient.update(request, RequestOptions.DEFAULT);
}
3.7 删除Document
public void deleteDocument(String indexName) throws IOException{
DeleteByQueryRequest deleteByQueryRequest = new DeleteByQueryRequest();
// 待删除的数据需要满足的条件
deleteByQueryRequest.setQuery(new TermQueryBuilder("user", "kimchy"));
// 忽略版本冲突
deleteByQueryRequest.setConflicts("proceed");
esClient.deleteByQuery(deleteByQueryRequest, RequestOptions.DEFAULT);
}
3.8 bulk api批量操作
public void bulkDocument(String indexName) throws IOException{
BulkRequest request = new BulkRequest();
// 删除操作
request.add(new DeleteRequest(indexName, "3"));
// 更新操作
request.add(new UpdateRequest(indexName, "2")
.doc(XContentType.JSON,"other", "test"));
// 普通的PUT操作,相当于全量替换或新增
request.add(new IndexRequest(indexName).id("4")
.source(XContentType.JSON,"field", "baz"));
esClient.bulk(request, RequestOptions.DEFAULT)
}
3.10 搜索描述中包含dubbo的document,并筛选过滤年龄15~40之间的document
public void searchDocument(String indexNmae) throws IOException{
SearchRequest searchRequest = new SearchRequest(indexNmae);
BoolQueryBuilder booleanQueryBuilder = QueryBuilders.boolQuery();
// 过滤出年龄在15~40岁之间的docuemnt
booleanQueryBuilder.filter(QueryBuilders.rangeQuery("age").from(15).to(40));
// bool must条件, 找出description字段中包含Dubbo的document
booleanQueryBuilder.must(QueryBuilders.matchQuery("description", "Dubbo"));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(booleanQueryBuilder);
sourceBuilder.from(0);
sourceBuilder.size(5);
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
searchRequest.source(sourceBuilder);
// 同步的方式发送请求
esClient.search(searchRequest, RequestOptions.DEFAULT);
}
四. 拓展
4.1 IK分词器
4.1.1 下载IK分词器
点我 参考IK与Elasticsearch的版本对照图选择下载合适的版本
4.1.2 安装IK分词器
对下载后的项目进行编译、打包。打包后的资源在target/release目录下,elasticsearch-analysis-ik-7.6.1.zip
在ES的安装目录中找到plugins目录,手动创建子目录ik,最后将elasticsearch-analysis-ik-7.6.1.zip解压缩到ik目录中,重启ES即可。
4.1.3 IK热词更新
如果直接使用本地自定义词库文件定义最新的词条,那么每次定义完新的词条后,必须重启ES才能生效,这样对整个系统影响非常大。IK支持远程词库加载,实现的原理就是在ES启动并加载IK时,在IK的初始化方法中开辟新的线程定时轮询Mysql数据库,将热词从DB同步至ES中。 注意:只能动态新增、更新,不能删除已有热词。
- 修改pom.xml文件,使依赖中es的版本号与实际相符。
<properties>
<!-- 与环境中ES的版本号保持一致 -->
<elasticsearch.version>7.6.1</elasticsearch.version>
... 省略其它配置
</properties>
- 在org.wltea.analyzer.dic.Dictionary中,新增定时任务线程池,并修改IK初始化方法 initial()
HotDicLoadingTask的定义在第4步
private static ScheduledExecutorService hotDictionaryTaskPool = Executors.newScheduledThreadPool(1);
public static synchronized void initial(Configuration cfg) {
if (singleton == null) {
synchronized (Dictionary.class) {
if (singleton == null) {
... 省略代码
// 启动一个自定义的线程,实现远程访问DB,查询热词
// 启动后延迟10秒才开始运行,每5秒钟运行一次
// 实际工作项目中,没必要达到这么高频率的运行次数
hotDictionaryTaskPool.scheduleAtFixedRate(new HotDicLoadingTask(),
10, 5, TimeUnit.SECONDS);
... 省略代码
}
}
}
}
- 在org.wltea.analyzer.dic.Dictionary中,新增方法:
/**
* 远程热词的加载方法
*/
public void reloadHotDictionary() {
// 加载自定义远程核心词库 相当于在IKAnalyzer.cfg.xml中自定义配置了ext_dict词库
this.loadMainHotDicFromDB();
// 加载自定义远程停用词词库 相当于在IKAnalyzer.cfg.xml中自定义配置了ext_stopwords停用词词库
// 所谓的停用词,类似于介词
this.loadStopWordDicFromDB();
}
/**
* 加载classpath下的配置文件
*/
private static Properties properties = new Properties();
/*
* 一般商业环境不会轻易修改数据库
*/
static {
try {
logger.info("Register mysql database driver");
DriverManager.registerDriver(new com.mysql.cj.jdbc.Driver());
} catch (SQLException e) {
logger.error("Register mysql database driver error: ", e);
}
}
/**
* 访问数据库,加载自定义热词
* 向_MainDict加载填充词典片段
*/
private void loadMainHotDicFromDB() {
try {
// 加载外部自定义的配置文件
// getDictRoot() - 用于获取IK基础路径的方法 对应着: $ES_HOME/plugins/ik/config
Path file = PathUtils.get(getDictRoot(), "hot_dict_db_source.properties");
// 读取配置信息至Properties
properties.load(new FileInputStream(file.toFile()));
logger.info("properties information: " + properties);
} catch (IOException e) {
logger.error("can not find properties: ", e);
}
try (Connection connection = DriverManager.getConnection(properties.getProperty("db.url"),
properties.getProperty("db.username"),
properties.getProperty("db.password"));
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery(properties.getProperty("db.reload.mainHotDic.sql"))
) {
while (resultSet.next()) {
String word = resultSet.getString("word");
logger.info("hot word from DB: " + word);
// _MainDict用于在内存中保存词典信息,对应着main.dic中的文件
_MainDict.fillSegment(word.trim().toCharArray());
}
} catch (SQLException e) {
logger.error("execute sql or connect db exception: ", e);
}
}
/**
* 访问数据库 加载自定义停用词
* 向_StopWords加载填充停用词词典片段
*/
private void loadStopWordDicFromDB() {
try {
Path path = PathUtils.get(getDictRoot(), "hot_dict_db_source.properties");
properties.load(new FileInputStream(path.toFile()));
} catch (IOException e) {
logger.error("can not find properties: ", e);
}
try (Connection connection = DriverManager.getConnection(properties.getProperty("db.url"),
properties.getProperty("db.username"),
properties.getProperty("db.password"));
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery(properties.getProperty("db.reload.stopWordDic.sql"))
) {
while (resultSet.next()) {
String word = resultSet.getString("word");
logger.info("stop word from DB: " + word);
// _StopWords用于在内存中保存停用词信息,对应着stopword.dic
_StopWords.fillSegment(word.toCharArray());
}
} catch (SQLException e) {
logger.error("execute sql or connect db exception: ", e);
}
}
private String getProperty(String key){
if(props!=null){
return props.getProperty(key);
}
return null;
}
- 自定线程类,专门用于热词更新
public class HotDicLoadingTask implements Runnable{
private static final Logger LOGGER = ESPluginLoggerFactory.getLogger(HotDicLoadingTask.class.getName());
@Override
public void run() {
LOGGER.info("====================reload hot dictionary from mysql database====================");
/*
Dictionary在IK中是单例的 构造函数都是私有的,只能通过getSingleton()获取对象实例,只能通过initial()来初始化对象实例
reloadHotDictionary() 用于加载远程热词
*/
Dictionary.getSingleton().reloadHotDictionary();
}
}
- 新增配置文件 config/hot_dict_db_source.properties 用于提供连接Mysql的相关配置
db.url=jdbc:mysql://127.0.0.1:3306/hotdic?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT%2B8
db.username=root
db.password=123456
db.reload.mainHotDic.sql=select word from tb_main_hot_dic
db.reload.stopWordDic.sql=select word from tb_stop_word_dic
- 新增mysql驱动依赖
找到pom.xml文件,新增如下依赖:
<!-- mysql的驱动 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.19</version>
</dependency>
- 修改assembly配置文件,使项目打包时将mysql驱动打包至zip中。
找到src/main/assemblies/plugin.xml 在标签内增加以下内容:
<dependencySet>
<outputDirectory/>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<includes>
<include>mysql:mysql-connector-java</include>
</includes>
</dependencySet>
- Mysql 数据库相关脚本:
create database hotdic charset=utf8;
use hotdic;
DROP TABLE IF EXISTS `tb_main_hot_dic`;
CREATE TABLE `tb_main_hot_dic` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`word` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `tb_stop_word_dic`;
CREATE TABLE `tb_stop_word_dic` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`word` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
- 修改jdk权限配置
JDK对外部执行应用也有权限限制,默认情况下,外部应用(如ES)在使用JDK相关内核组件的时候(如ClassLoader)、使用JDK网络访问其他应用的时候(如Socket连接等),都需要有对应的权限。这里就是修改本地JDK,让本地启动的应用程序访问JDK内核或通过JDK访问外部资源的时候,拥有权限,避免错误的可能。
找到$JAVA_HOME/jre/lib/security/java.policy 在grant中添加如下内容:
permission java.lang.RuntimePermission "createClassLoader";
permission java.lang.RuntimePermission "getClassLoader";
permission java.net.SocketPermission "127.0.0.1:3306","connect,resolve";
permission java.lang.RuntimePermission "setContextClassLoader";
最后对ik进行打包,并把最终打包出的zip包的内容替换至/plugins/ik中 。