lucene中的分析器与IKAnalyzer第三方分析器的使用代码

最新推荐文章于 2018-12-25 11:32:33 发布

字节律动

最新推荐文章于 2018-12-25 11:32:33 发布

阅读量315

点赞数

分类专栏： lucene 文章标签： lucene IKAnalyzer

本文链接：https://blog.csdn.net/weixin_42613538/article/details/83312396

版权

lucene 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

lucene自带的分析器使用代码:

@Test
public void testTokenStream() throws Exception {
    //创建一个标准分析器对象
    Analyzer analyzer = new StandardAnalyzer();
    //获得tokenStream对象
    //第一个参数：域名，可以随便给一个
    //第二个参数：要分析的文本内容
    TokenStream tokenStream = analyzer.tokenStream("test", "The Spring Framework provides a comprehensive programming and configuration model.");
    //添加一个引用，可以获得每个关键词
    CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
    //添加一个偏移量的引用，记录了关键词的开始位置以及结束位置
    OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
    //将指针调整到列表的头部
    tokenStream.reset();
    //遍历关键词列表，通过incrementToken方法判断列表是否结束
    while(tokenStream.incrementToken()) {
        //关键词的起始位置
        System.out.println("start->" + offsetAttribute.startOffset());
        //取关键词
        System.out.println(charTermAttribute);
        //结束位置
        System.out.println("end->" + offsetAttribute.endOffset());
    }
    tokenStream.close();
}

特点：

StandardAnalyzer：

自带的分词器，是按照中文一个字一个字地进行分词。如：“我爱中国”，
效果：“我”、“爱”、“中”、“国”。

SmartChineseAnalyzer

对中文支持较好，但扩展性差，扩展词库，禁用词库和同义词库等不好处理

IKAnalyzer分词器：

使用方法：

第一步：把jar包添加到工程中：IK-Analyzer-1.0-SNAPSHOT.jar

第二步：把配置文件和扩展词典和停用词词典添加到classpath下

IKAnalyzer.cfg.cml配置文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">hotword.dic;</entry>
	
	<!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">stopword.dic;</entry> 
	
</properties>

其中:拓展词典和停用词词典，按需求添加即可

注意：hotword.dic和ext_stopword.dic文件的格式为UTF-8，注意是无BOM 的UTF-8 编码。

也就是说禁止使用windows记事本编辑扩展词典文件

@Test
public void addDocument() throws Exception {
    //索引库存放路径
    Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
    IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
    //创建一个indexwriter对象
    IndexWriter indexWriter = new IndexWriter(directory, config);
//...
}