Elasticsearch 同义词(dynamic-synonym插件)远程热词更新

Elasticsearch 同义词(dynamic-synonym)远程热词更新

零、版本说明

  • elasticsearch7.2.0

  • 同义词插件:elasticsearch-analysis-dynamic-synonym

  • elasticsearch-analysis-dynamic-synonym插件官网

    https://github.com/bells/elasticsearch-analysis-dynamic-synonym
    
  • 后面内容为同义词更新词库方法,elasticsearch安装同义词插件不再赘述;

  • 由于服务器jdk版本与es使用版本不一致,以下为es单独指定jdk版本的;

一、同义词本地文件读取方式(可不用插件)

在es内部添加同义词文件,实现同义词查询,es已内置该功能;

官方说明文档地址如下:

https://www.elastic.co/guide/en/elasticsearch/reference/7.2/analysis-synonym-tokenfilter.html

简单使用方法:

1、添加同义词文件

在elasticsearch安装目录下的config目录下(elasticsearch-7.2.0/config)新建synonyms.txt文本;

并在文本内添加同义词如(英文逗号分隔):

美元,美金,美币
苹果,iphone

2、创建索引,并配置同义词过滤

PUT syno_v1
{
  "settings": {
    "index":{
      "number_of_shards": "3",
      "number_of_replicas": "1",
      "max_result_window": "200000",
      "analysis": {
        "filter":{
          "my_syno_filter":{
            "type":"synonym",
            "synonyms_path":"synonyms.txt"
          }
        },
          "ik_max_syno": {
            "type":"custom",
            "tokenizer": "ik_max_word",
            "filter": [
              "lowercase",
              "my_syno_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "keyword": {
        "type": "text",
        "analyzer": "ik_max_syno"
      }
    }
  }
}

3、测试效果

dsl:

GET syno_v1/_analyze
{
  "analyzer": "ik_max_syno",
  "text": "苹果"
}

结果:

{
  "tokens" : [
    {
      "token" : "苹果",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "iphone",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

二、同义词插件远程词库调用

1、同义词插件官网说明:

Example:

{
	"index" : {
	    "analysis" : {
	        "analyzer" : {
	            "synonym" : {
	                "tokenizer" : "whitespace",
	                "filter" : ["remote_synonym"]
 	           }
	        },
	        "filter" : {
	            "remote_synonym" : {
	                "type" : "dynamic_synonym",
	                "synonyms_path" : "http://host:port/synonym.txt",
	                "interval": 30
	            },
	            "local_synonym" : {
	                "type" : "dynamic_synonym",
	                "synonyms_path" : "synonym.txt"
	            }
	        }
	    }
	}
}

Configuration:

synonyms_path: A file path relative to the Elastic config file or an URL, mandatory

相对于Elastic配置文件或URL的文件路径(必填)

interval: Refresh interval in seconds for the synonym file, default: 60, optional

同义词文件的刷新间隔(以秒为单位),默认值:60,可选

ignore_case: Ignore case in synonyms file, default: false, optional

忽略同义词文件中的大小写,默认值:false,可选

expand: Expand, default: true, optional

lenient: Lenient on exception thrown when importing a synonym, default: false, optional

format: Synonym file format, default: '', optional. For WordNet structure this can be set to 'wordnet'

2、在服务中实现http请求,并连接数据库实现热词管理实例:

@RestController
@RequestMapping("/synonym")
@Slf4j
public class SynonymController {

    private String lastModified = new Date().toString();
    private String etag = String.valueOf(System.currentTimeMillis());

    @RequestMapping(value = "/word", method = {RequestMethod.GET,RequestMethod.HEAD}, produces="text/html;charset=UTF-8")
    public String getSynonymWord(HttpServletResponse response){
        response.setHeader("Last-Modified",lastModified);
        response.setHeader("ETag",etag);
        //response.setHeader("If-Modified-Since",lastModified);
        Connection conn = null;
        Statement stmt = null;
        ResultSet rs = null;
        StringBuilder words = new StringBuilder();
        try {
            Class.forName("oracle.jdbc.driver.OracleDriver");
            conn = DriverManager.getConnection(
                    "jdbc:oracle:thin:@192.168.114.13:1521:xe",
                    "test",
                    "test"
            );
            stmt = conn.createStatement();
            rs = stmt.executeQuery("select word from SYNONYM_WORD where status=0");

            while(rs.next()) {
                String theWord = rs.getString("word");
                System.out.println("hot word from mysql: " + theWord);
                words.append(theWord);
                words.append("\n");
            }

            return words.toString();

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if(rs != null) {
                try {
                    rs.close();
                } catch (SQLException e) {
                    log.error("资源关闭异常:",e);
                }
            }
            if(stmt != null) {
                try {
                    stmt.close();
                } catch (SQLException e) {
                    log.error("资源关闭异常:",e);
                }
            }
            if(conn != null) {
                try {
                    conn.close();
                } catch (SQLException e) {
                    log.error("资源关闭异常:",e);
                }
            }
        }
        return null;
    }

    @RequestMapping(value = "/update", method = RequestMethod.GET)
    public void updateModified(){
        lastModified = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());
        etag = String.valueOf(System.currentTimeMillis());
    }
}

注:

  • updateModified方法为单独更新lastModified与etag,用于判断ik是否需要重新加载远程词库,具体关联数据库操作代码时自行扩展

3、根据远程请求创建索引:

PUT syno_v2
{
  "settings": {
    "index":{
      "number_of_shards": "3",
      "number_of_replicas": "1",
      "max_result_window": "200000",
      "analysis": {
        "filter":{
          "remote_syno_filter":{
            "type":"dynamic_synonym",
            "synonyms_path":"http://192.168.xx.xx:8080/synonym/word"
          }
        },
        "ik_max_syno": {
            "type":"custom",
            "tokenizer": "ik_max_word",
            "filter": [
              "lowercase",
              "remote_syno_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "keyword": {
        "type": "text",
        "analyzer": "ik_max_syno"
      }
    }
  }
}

三、重写同义词插件源码连接mysql/oracle更新词库

1、下载同义词插件

https://github.com/bells/elasticsearch-analysis-dynamic-synonym

2、修改ik插件源码(以oracle为例,mysql对应修改配置即可)

1)、添加jdbc配置文件

在项目根目录下创建config目录并创建config\jdbc-reload.properties配置文件:

jdbc.url=jdbc:oracle:thin:@192.168.xx.xx:1521:xe
jdbc.user=test
jdbc.password=test
jdbc.reload.synonym.sql=SELECT word FROM TEST.SYNONYM_WORD WHERE STATUS = 0
jdbc.lastModified.synonym.sql=SELECT MAX(UPDATE_TIME) AS last_modify_dt FROM TEST.SYNONYM_WORD
jdbc.driver=oracle.jdbc.driver.OracleDriver
2)、创建DBRemoteSynonymFile类

在目录analysis下创建DBRemoteSynonymFile类,

具体位置:

src\main\java\com\bellszhu\elasticsearch\plugin\synonym\analysis\DBRemoteSynonymFile.java

内容:

package com.bellszhu.elasticsearch.plugin.synonym.analysis;

import com.bellszhu.elasticsearch.plugin.DynamicSynonymPlugin;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.env.Environment;

import java.io.*;
import java.nio.file.Path;
import java.sql.*;
import java.util.ArrayList;
import java.util.Properties;

/**
 * com.bellszhu.elasticsearch.plugin.synonym.analysis
 * author: Yic.z
 * date: 2020-08-04
 */
public class DBRemoteSynonymFile implements SynonymFile {

    // 配置文件名
    private final static String DB_PROPERTIES = "jdbc-reload.properties";
    private static Logger logger = LogManager.getLogger("dynamic-synonym");
    private String format;

    private boolean expand;

    private boolean lenient;

    private Analyzer analyzer;

    private Environment env;

    // 数据库配置
    private String location;

    private long lastModified;

    private Connection connection = null;

    private Statement statement = null;

    private Properties props;

    private Path conf_dir;


    DBRemoteSynonymFile(Environment env, Analyzer analyzer,
                        boolean expand,boolean lenient, String format, String location) {
        this.analyzer = analyzer;
        this.expand = expand;
        this.lenient = lenient;
        this.format = format;
        this.env = env;
        this.location = location;
        this.props = new Properties();

        //读取当前 jar 包存放的路径
        Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource()
                .getLocation().getPath())
                .getParent(), "config")
                .toAbsolutePath();
        this.conf_dir = filePath.resolve(DB_PROPERTIES);

        //判断文件是否存在
        File configFile = conf_dir.toFile();
        InputStream input = null;
        try {
            input = new FileInputStream(configFile);
        } catch (FileNotFoundException e) {
            logger.info("jdbc-reload.properties not find. " + e);
        }
        if (input != null) {
            try {
                props.load(input);
            } catch (IOException e) {
                logger.error("fail to load the jdbc-reload.properties," + e);
            }
        }
        isNeedReloadSynonymMap();
    }

    /**
     * 加载同义词词典至SynonymMap中
     * @return SynonymMap
     */
    @Override
    public SynonymMap reloadSynonymMap() {
        try {
            logger.info("start reload local synonym from {}.", location);
            Reader rulesReader = getReader();
            SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);
            return parser.build();
        } catch (Exception e) {
            logger.error("reload local synonym {} error!", e, location);
            throw new IllegalArgumentException(
                    "could not reload local synonyms file to build synonyms", e);
        }
    }

    /**
     * 判断是否需要进行重新加载
     * @return true or false
     */
    @Override
    public boolean isNeedReloadSynonymMap() {
        try {
            Long lastModify = getLastModify();
            if (lastModified < lastModify) {
                lastModified = lastModify;
                return true;
            }
        } catch (Exception e) {
            logger.error(e);
        }

        return false;
    }

    /**
     * 获取同义词库最后一次修改的时间
     * 用于判断同义词是否需要进行重新加载
     *
     * @return getLastModify
     */
    public Long getLastModify() {
        ResultSet resultSet = null;
        Long last_modify_long = null;
        try {
            if (connection == null || statement == null) {
                Class.forName(props.getProperty("jdbc.driver"));
                connection = DriverManager.getConnection(
                        props.getProperty("jdbc.url"),
                        props.getProperty("jdbc.user"),
                        props.getProperty("jdbc.password")
                );
                statement = connection.createStatement();
            }
            resultSet = statement.executeQuery(props.getProperty("jdbc.lastModified.synonym.sql"));
            while (resultSet.next()) {
                Timestamp last_modify_dt = resultSet.getTimestamp("last_modify_dt");
                last_modify_long = last_modify_dt.getTime();
            }
        } catch (ClassNotFoundException | SQLException e) {
            logger.error("获取同义词库最后一次修改的时间",e);
        } finally {
            try {
                if (resultSet != null) {
                    resultSet.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }

        }
        return last_modify_long;
    }

    /**
     * 查询数据库中的同义词
     * @return DBData
     */
    public ArrayList<String> getDBData() {
        ArrayList<String> arrayList = new ArrayList<>();
        ResultSet resultSet = null;
        try {
            if (connection == null || statement == null) {
                Class.forName(props.getProperty("jdbc.driver"));
                connection = DriverManager.getConnection(
                        props.getProperty("jdbc.url"),
                        props.getProperty("jdbc.user"),
                        props.getProperty("jdbc.password")
                );
                statement = connection.createStatement();
            }
            resultSet = statement.executeQuery(props.getProperty("jdbc.reload.synonym.sql"));
            while (resultSet.next()) {
                String theWord = resultSet.getString("word");
                arrayList.add(theWord);
            }
        } catch (ClassNotFoundException | SQLException e) {
            logger.error("查询数据库中的同义词异常",e);
        } finally {
            try {
                if (resultSet != null) {
                    resultSet.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }

        }
        return arrayList;
    }

    /**
     * 同义词库的加载
     * @return Reader
     */
    @Override
    public Reader getReader() {
        StringBuffer sb = new StringBuffer();
        try {
            ArrayList<String> dbData = getDBData();
            for (int i = 0; i < dbData.size(); i++) {
                logger.info("load the synonym from db," + dbData.get(i));
                sb.append(dbData.get(i))
                        .append(System.getProperty("line.separator"));
            }
        } catch (Exception e) {
            logger.error("reload synonym from db failed");
        }
        return new StringReader(sb.toString());
    }
}

修改DynamicSynonymTokenFilterFactory类中的getSynonymFile方法:

添加选择远程直连数据库方法

SynonymFile getSynonymFile(Analyzer analyzer) {
        try {
            SynonymFile synonymFile;
            if (location.equals("fromDB")){
                synonymFile = new DBRemoteSynonymFile(environment, analyzer, expand, lenient, format,
                        location);
            } else if (location.startsWith("http://") || location.startsWith("https://")) {
                synonymFile = new RemoteSynonymFile(environment, analyzer, expand, lenient, format,
                        location);
            } else {
                synonymFile = new LocalSynonymFile(environment, analyzer, expand, lenient, format,
                        location);
            }
            if (scheduledFuture == null) {
                scheduledFuture = pool.scheduleAtFixedRate(new Monitor(synonymFile),
                                interval, interval, TimeUnit.SECONDS);
            }
            return synonymFile;
        } catch (Exception e) {
            throw new IllegalArgumentException("failed to get synonyms : " + location, e);
        }
    }
3)、添加数据maven库依赖
  • 在pom.xml文件中添加数据库依赖

    <!-- 新增oracle依赖 -->
    <dependency>
        <groupId>com.oracle.ojdbc</groupId>
        <artifactId>ojdbc8</artifactId>
        <version>19.3.0.0</version>
    </dependency>
    
  • 根据es版本修改ik对应版本

    <version>7.2.0</version>
    
  • 在src\main\assemblies\plugin.xml中添加配置使得数据库相关依赖一并打包

    在中添加:

    <dependencySet>
        <outputDirectory/>
        <useProjectArtifact>true</useProjectArtifact>
        <useTransitiveFiltering>true</useTransitiveFiltering>
        <includes>
        	<include>com.oracle.ojdbc:ojdbc8</include>
        </includes>
    </dependencySet>
    

    空白处添加打包配置文件配置

    <fileSets>
        <fileSet>
        	<directory>${project.basedir}/config</directory>
        	<outputDirectory>config</outputDirectory>
        </fileSet>
    </fileSets>
    
4)、maven打包项目

maven打包地址

3、创建索引

修改后的同义词插件使用例子:

PUT syno_v2
{
  "settings": {
    "index":{
      "number_of_shards": "3",
      "number_of_replicas": "1",
      "max_result_window": "200000",
      "analysis": {
        "filter":{
          "remote_syno_filter":{
            "type":"dynamic_synonym",
            "synonyms_path":"fromDB",
            "interval": 120
          }
        },
        "ik_max_syno": {
            "type":"custom",
            "tokenizer": "ik_max_word",
            "filter": [
              "lowercase",
              "remote_syno_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "keyword": {
        "type": "text",
        "analyzer": "ik_max_syno"
      }
    }
  }
}

4、注意事项

如更新ik插件以后,出现报错如下:

java.security.AccessControlException: access denied (java.net.SocketPermission172.16.xxx.xxx:3306 connect,resolve)
报错信息

这是jar的安全策略的错误(具体没有深究),解决方案如下:

1、在ik源码的config中创建文件socketPolicy.policy

grant {
   permission java.net.SocketPermission "business.mysql.youboy.com:3306","connect,resolve";
};

2、在服务器上的es中的config目录文件jvm.option添加如下代码配置上面的文件路径

-Djava.security.policy=/data/elasticsearch-6.5.3/plugins/ik/config/socketPolicy.policy

5、参考文章

https://blog.csdn.net/weixin_43315211/article/details/100144968

ps:ik分词器实现词库热更新文章链接

  • 8
    点赞
  • 27
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
elasticsearch-analysis-dynamic-synonym-master.zip是一个压缩文件,其中包含用于Elasticsearch的动态同义词分析插件Elasticsearch是一个开源的分布式搜索引擎,用于快速检索和分析大量数据。 这个插件的作用是允许用户根据需要动态更新和管理同义词,以提高搜索的准确性和覆盖范围。在搜索引擎中,同义词是指具有相似意义的不同词语,如"汽车"和"车辆"。通过使用同义词分析插件,搜索引擎可以将搜索词与同义词匹配,从而拓宽搜索结果的范围。 要使用这个插件,首先需要将压缩文件解压缩到适当的目录中。然后,将插件添加到Elasticsearch插件目录中,并重新启动Elasticsearch服务。一旦插件安装完成,就可以开始配置和使用动态同义词分析器。 在使用这个插件时,用户可以定义一个同义词文件,其中包含需要用于匹配的同义词对。插件将定期(可以通过设置来调整时间间隔)扫描这个同义词文件,并将其加载到内存中。然后,当用户执行搜索操作时,分析器将根据这些同义词对修改搜索词,并在搜索过程中考虑到它们。 动态同义词分析插件在大型搜索平台和电子商务领域中非常有用。它可以通过改进搜索引擎的召回率和精确度来提供更好的搜索体验。同义词的动态管理也使得系统可以灵活应对新的同义词和词语变化。 总之,elasticsearch-analysis-dynamic-synonym-master.zip是一个用于Elasticsearch的动态同义词分析插件,它可以帮助用户优化搜索结果,提高搜索准确性和广度,并灵活应对同义词和词语变化。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值