我们之前用es的ik分词器,想要扩展里面的分词,通常会添加分词文件,然后重启es,这样的弊端显而易见,首先重启服务对生产环境来说是大忌,其次呢生产环境一般多有集群部署,要更新每台es上的文件,是不是很不方便(有可能要反驳,用Ansible不行么,嗯:))。
我们可能会想到热更新,关键热更新放到哪个地方,数据库,http的方式通知,又或者读取Spring Cloud Config这种远程配置,答案是这些方式都是可以的。
但是我们基于什么原理去做这件事呢,于是我就把es_ik_6.4.3的source给check in了,我发现它是一个定时任务,去刷新远程的配置,以下给出代码:
/**
* 词典初始化 由于IK Analyzer的词典采用Dictionary类的静态方法进行词典初始化
* 只有当Dictionary类被实际调用时,才会开始载入词典, 这将延长首次分词操作的时间 该方法提供了一个在应用加载阶段就初始化字典的手段
*
* @return Dictionary
*/
public static synchronized Dictionary initial(Configuration cfg) {
if (singleton == null) {
synchronized (Dictionary.class) {
if (singleton == null) {
singleton = new Dictionary(cfg);
singleton.loadMainDict();
singleton.loadSurnameDict();
singleton.loadQuantifierDict();
singleton.loadSuffixDict();
singleton.loadPrepDict();
singleton.loadStopWordDict();
new Thread(new HotDictReloadThread()).start();
if (cfg.isEnableRemoteDict()) {
// 建立监控线程
for (String location : singleton.getRemoteExtDictionarys()) {
// 10 秒是初始延迟可以修改的 60是间隔时间 单位秒
pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
}
for (String location : singleton.getRemoteExtStopWordDictionarys()) {
pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
}
}
return singleton;
}
}
}
return singleton;
}
看见没,有两个scheduleAtFixedRate 轮询任务,在一直动态拉取远程的配置分词。(这个scheduleAtFixedRate会有专门手撕),接下来就是实现姿势了。
一、基于mysql实现动态刷新
这种方式需要修改源代码,三步走。
1、pull ik源代码,在他刷新的时候让他连接数据库;
2、打包放到es的plugin目录下;
3、启动es,像mysql加分词,查看es的分词效果。
修改源代码步骤
创建HotDictReloadThread,用于启动线程刷新
package org.ext;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.common.logging.ESLoggerFactory;
import org.wltea.analyzer.dic.Dictionary;
public class HotDictReloadThread implements Runnable {
private static final Logger logger = ESLoggerFactory.getLogger(HotDictReloadThread.class.getName());
@Override
public void run() {
while (true) {
logger.info("[==========] reload hot dict from mysql");
Dictionary.getSingleton().reLoadMainDict();
}
}
}
org.wltea.analyzer.dic.Dictionary##initial中添加
org.wltea.analyzer.dic.Dictionary##loadMainDict中添加this.loadMySQLExtDict(); 方法定义如下:
/**
* 从mysql加载热更新词典
*/
private void loadMySQLExtDict() {
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties");
prop.load(new FileInputStream(file.toFile()));
logger.info("[==========]jdbc-reload.properties");
for (Object key : prop.keySet()) {
logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key)));
}
logger.info("[==========]query hot dict from mysql, " + prop.getProperty("jdbc.reload.sql") + "......");
conn = DriverManager.getConnection(
prop.getProperty("jdbc.url"),
prop.getProperty("jdbc.user"),
prop.getProperty("jdbc.password"));
stmt = conn.createStatement();
rs = stmt.executeQuery(prop.getProperty("jdbc.reload.sql"));
while (rs.next()) {
String theWord = rs.getString("word");
logger.info("[==========]hot word from mysql: " + theWord);
_MainDict.fillSegment(theWord.trim().toCharArray());
}
Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval"))));
} catch (Exception e) {
logger.error("erorr", e);
} finally {
if (rs != null) {
try {
rs.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
org.wltea.analyzer.dic.Dictionary##loadStopWordDict中添加this.loadMySQLStopwordDict(); 方法定义如下:
/**
* 从mysql加载停用词
*/
private void loadMySQLStopwordDict() {
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties");
prop.load(new FileInputStream(file.toFile()));
logger.info("[==========]jdbc-reload.properties");
for (Object key : prop.keySet()) {
logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key)));
}
logger.info("[==========]query hot stopword dict from mysql, " + prop.getProperty("jdbc.reload.stopword.sql") + "......");
conn = DriverManager.getConnection(
prop.getProperty("jdbc.url"),
prop.getProperty("jdbc.user"),
prop.getProperty("jdbc.password"));
stmt = conn.createStatement();
rs = stmt.executeQuery(prop.getProperty("jdbc.reload.stopword.sql"));
while (rs.next()) {
String theWord = rs.getString("word");
logger.info("[==========]hot stopword from mysql: " + theWord);
_StopWords.fillSegment(theWord.trim().toCharArray());
}
Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval"))));
} catch (Exception e) {
logger.error("erorr", e);
} finally {
if (rs != null) {
try {
rs.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
在config目录下面添加jdbc-reload.properties
jdbc.url=jdbc:mysql://localhost:3306/ik?serverTimezone=GMT&&useSSL=false
jdbc.user=root
jdbc.password=root
jdbc.reload.sql=select word from hot_words
jdbc.reload.stopword.sql=select stopword as word from hot_stopwords
jdbc.reload.interval=1000
mysql数据库脚本如下:
/*
Navicat Premium Data Transfer
Source Server : localhost
Source Server Type : MySQL
Source Server Version : 50719
Source Host : localhost:3306
Source Schema : ik
Target Server Type : MySQL
Target Server Version : 50719
File Encoding : 65001
Date: 03/12/2019 11:41:40
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for hot_stopwords
-- ----------------------------
DROP TABLE IF EXISTS `hot_stopwords`;
CREATE TABLE `hot_stopwords` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stopword` longtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4;
-- ----------------------------
-- Table structure for hot_words
-- ----------------------------
DROP TABLE IF EXISTS `hot_words`;
CREATE TABLE `hot_words` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`word` longtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4;
SET FOREIGN_KEY_CHECKS = 1;
打包执行命令,执行打包成功。
mvn clean package -Dmaven.test.skip=true
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (default) @ elasticsearch-analysis-ik ---
[INFO] Reading assembly descriptor: /Users/batman/IdeaProjects/elasticsearch-analysis-ik-6.4.3/src/main/assemblies/plugin.xml
[INFO] Building zip: /Users/batman/IdeaProjects/elasticsearch-analysis-ik-6.4.3/target/releases/elasticsearch-analysis-ik-6.4.3.zip
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.937 s
[INFO] Finished at: 2019-12-03T11:44:02+08:00
[INFO] ------------------------------------------------------------------------
将打好的包放在plugins目录下面,并且将数据库也放在lib下面。
(base) batmandeMacBook-Pro:ik batman$ pwd
/Users/batman/Downloads/elasticsearch-6.4.3/plugins/ik
(base) batmandeMacBook-Pro:ik batman$ ls
commons-codec-1.9.jar elasticsearch-analysis-ik-6.4.3.zip plugin-descriptor.properties
commons-logging-1.2.jar httpclient-4.5.2.jar plugin-security.policy
config httpcore-4.4.4.jar
elasticsearch-analysis-ik-6.4.3.jar mysql-connector-java-5.1.47.jar
启动es,先看下没有加扩展词的效果。可以看到靴子没有分词。
数据库中hot_words添加靴和子
查看es日志,可以看到已经加载成功
再次测试es分词效果
IK地址: https://github.com/fafeidou/elasticsearch-analysis-ik-6.4.3