当我们要更新IK分词器词库时,都要在扩展词典中手动添加新词,添加完成后都要重启es才能生效。更致命的是,es肯定是分布式的,可能有数百个节点,我们不能每次都一个一个节点上面去修改。所以我们要实现es不停机更新新词,通过修改ik分词器源码,然后手动支持从mysql中每隔一定时间,自动加载新的词库。
1、下载源码
下载地址:https://github.com/medcl/elasticsearch-analysis-ik/tree/v7.2.0
IK分词器版本要和ES版本一样
2、修改源码
- 添加jdbc配置文件
添加 jdbc-reload.properties 配置文件
jdbc.url=jdbc:mysql://127.0.0.1:3307/test?serverTimezone=GMT
jdbc.user=root
jdbc.password=abc123456
jdbc.reload.sql=select word from hot_words
jdbc.reload.stopword.sql=select stopword as word from hot_stopwords
jdbc.reload.interval=1000
- 添加热更新线程
添加热更新线程类 HotDictReloadThread,就是一个死循环,不断调用Dictionary.getSingleton().reLoadMainDict(),去重新加载词典
public class HotDictReloadThread implements Runnable{
private static final Logger LOGGER = ESPluginLoggerFactory.getLogger(HotDictReloadThread.class.getName());
@Override
public void run() {
while (true){
LOGGER.info("reload hot dict from mysql");
Dictionary.getSingleton().reLoadMainDict();
}
}
}
- 修改Dictionary类
I、修改 initial 初始化方法,创建一个我们自定义的线程,并且启动它
new Thread(new HotDictReloadThread()).start();
II、增加从mysql加载扩展词典方法
private static Properties prop = new Properties();
static {
try {
Class.forName("com.mysql.jdbc.Driver");
} catch (ClassNotFoundException e){
logger.error("error",e);
}
}
/**
* 从mysql加载扩展词典
*/
private void loadMySqlExtDict(){
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
Path file = PathUtils.get(getDictRoot(),"jdbc-reload.properties");
prop.load(new FileInputStream(file.toFile()));
logger.info("jdbc-reload.properties");
for (Object key : prop.keySet()) {
logger.info(key + "=" + prop.getProperty(String.valueOf(key)));
}
logger.info("query hot dict from mysql," + prop.getProperty("jdbc.reload.sql"));
conn = DriverManager.getConnection(
prop.getProperty("jdbc.url"),
prop.getProperty("jdbc.user"),
prop.getProperty("jdbc.password")
);
stmt = conn.createStatement();
rs = stmt.executeQuery(prop.getProperty("jdbc.reload.sql"));
while (rs.next()){
String word = rs.getString("word");
logger.info("hot word from mysql:" + word);
_MainDict.fillSegment(word.trim().toCharArray());
}
Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval"))));
} catch (Exception e){
logger.error("error",e);
} finally {
if (rs != null) {
try {
rs.close();
} catch (SQLException e){
logger.error("error",e);
}
}
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e){
logger.error("error",e);
}
}
if (conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error",e);
}
}
}
}
并在加载主词典及扩展词典方法 loadMainDict 中调用
III、增加从mysql加载停用词方法 loadMySQLStopwordDict
/**
* 从mysql加载停用词
*/
private void loadMySQLStopwordDict() {
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties");
prop.load(new FileInputStream(file.toFile()));
logger.info("[==========]jdbc-reload.properties");
for(Object key : prop.keySet()) {
logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key)));
}
logger.info("[==========]query hot stopword dict from mysql, " + prop.getProperty("jdbc.reload.stopword.sql") + "......");
conn = DriverManager.getConnection(
prop.getProperty("jdbc.url"),
prop.getProperty("jdbc.user"),
prop.getProperty("jdbc.password"));
stmt = conn.createStatement();
rs = stmt.executeQuery(prop.getProperty("jdbc.reload.stopword.sql"));
while(rs.next()) {
String theWord = rs.getString("word");
logger.info("[==========]hot stopword from mysql: " + theWord);
_StopWords.fillSegment(theWord.trim().toCharArray());
}
Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval"))));
} catch (Exception e) {
logger.error("erorr", e);
} finally {
if(rs != null) {
try {
rs.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
并在加载用户扩展的停止词词典方法 loadStopWordDict 中调用
3、打包
mvn package打包代码
把文件target\releases\elasticsearch-analysis-ik-7.2.0.zip放到es的plugins中
4、解压缩
将zip包解压,并把mysql驱动放到ik目录下
5、重启es
重启过程中可能会报一下异常
解决办法:修改 jdk/jre/lib/security下的 java.policy 文件,在末尾添加以下内容
permission java.net.SocketPermission "127.0.0.1:3307","connect,resolve";
6、在mysql中添加词库与停用词
添加新词之前
添加新词
添加新词之后
添加停止词之前
添加停止词
添加停止词之后