solr配置中文分词器

最新推荐文章于 2024-06-15 15:42:06 发布

杨威利de红茶

最新推荐文章于 2024-06-15 15:42:06 发布

阅读量695

点赞数

分类专栏： solr 文章标签： solr

solr 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一、smartcn（转自http://www.07net01.com/linux/solr4_3zhipeizhizhongwenfencismartcn_455940_1372147112.html）

1、将自带的jar包拷贝到tomcat下

文件：solr-5.1.0/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-5.1.0.jar

拷贝到： /usr/local/tomcat/webapps/solr/WEB-INF/lib/

2、修改schema.xml，添加

<fieldType name="text_smart" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
        <filter class="solr.SmartChineseWordTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
        <filter class="solr.SmartChineseWordTokenFilterFactory"/>
    </analyzer>
</fieldType>

3、测试

二、mmseg4j

1、和smartcn一样将jar包（mmseg4j-core-1.10.0.jar、mmseg4j-solr-2.3.0.jar）复制到tomcat/webapps/solr/WEB-INF/lib/

2、修改schema.xml，添加以下内容后重启tomcat

<fieldType name="text_zh_wangtest" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic" />
		<filter class="solr.StopFilterFactory" words="stopwords.txt" />
	</analyzer>
</fieldType>

3、测试

三、索引字段加入中文分词

这里新建了张表articles

CREATE TABLE `articles` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(200) NOT NULL,
  `author` varchar(50) NOT NULL,
  `last_update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

网上找点数据塞进去

INSERT INTO `articles` VALUES ('1', '3.6亿曼城又被这卧底坑死！无底黑洞让1亿白花', '梓泉', '2016-01-07 09:43:17');
INSERT INTO `articles` VALUES ('2', '瓜帅版曼城豪阵曝光！组最奢华中轴 梅西领4天王', '小九', '2016-01-07 09:43:41');
INSERT INTO `articles` VALUES ('3', '联赛杯-弃将助小魔兽绝杀 曼城丢争议球1-2负 ', '斯科', '2016-01-07 09:43:50');
INSERT INTO `articles` VALUES ('4', '瓜迪奥拉去曼城已定！英媒:他与前巴萨2巨头重逢 ', '兰尼斯特', '2016-01-07 09:44:06');
INSERT INTO `articles` VALUES ('5', '曼城主帅讽皇马:同情贝帅 皇马干出这种事不奇怪 ', '马尔科', '2016-01-07 09:44:18');

solr后台新建core_articles。

修改data-config.xml

<dataConfig>  
  <dataSource name="solrDB" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/wangtest" user="root" password="123456"  />  
  <document>  
    <entity dataSource="solrDB" name="articles"  
            query="select * from articles"  
            deltaImportQuery="select * from articles where id= '${dih.delta.id}'"  
            deltaQuery="select id from articles where last_update_time > '${dataimporter.last_index_time}'">  
      <field column="id" name="id" />  
      <field column="title" name="title" />  
      <field column="author" name="author" />  
      <field column="last_update_time" name="last_update_time" />  
    </entity>  
  </document>  
</dataConfig>

修改schema.xml，添加filedType

<fieldType name="text_mmseg4j" class="solr.TextField" positionIncrementGap="100">  
  <analyzer type="index">  
    <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic" />  
    <filter class="solr.StopFilterFactory" words="stopwords.txt" />  
  </analyzer>  
</fieldType>

再添加相关索引字段

<field name="title" type="text_mmseg4j" indexed="true" stored="true" required="true" />
<field name="author" type="text_mmseg4j" indexed="true" stored="true" required="true" />
<field name="last_update_time" type="date" indexed="true" stored="true" />

重启tomcat后全量导入。

在Query里搜索的结果

对title字段进行关键字搜索（如：巴萨）结果

杨威利de红茶

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
solr配置中文分词器

一、smartcn（转自http://www.07net01.com/linux/solr4_3zhipeizhizhongwenfencismartcn_455940_1372147112.html）1、将自带的jar包拷贝到tomcat下文件：solr-5.1.0/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-
复制链接

扫一扫