[ solr入门 ] - 在schema.xml中加入中文分词(IKAnalyzer)

http://www.cnblogs.com/huangfox/archive/2012/02/08/2342881.html

一文中介绍的怎么将solr发布到eclipse中,现在就在原有的基础上将IKAnalyzer加入。

 

1.下载IKAnalyzer的源码,将其复制到solr3.5项目中,如下图:

2.在schema.xml配置IKAnalyzer

<!-- IKAnalyzer3.2.8 中文分词-->
	<fieldType name="text" class="solr.TextField">
		<analyzer type="index">
			<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"  isMaxWordLength="false"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
			<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="true"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.LowerCaseFilterFactory"/>
		</analyzer>   
    </fieldType>

3.启动solr进行验证

在field中选择type,并输入test,在field value中输入一段中文,Analyze既可以看到分词效果。

verbose output 选项可以查看分词详细信息。

 

具体的schema.xml的配置可以查看solr wiki:

http://wiki.apache.org/solr/SchemaXml

Data Types

The <types> section allows you to define a list of <fieldtype> declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.

Any subclass of FieldType may be used as a field type class, using either its full package name, or the "solr" alias if it is in the default Solr package. For common numeric types (integer, float, etc...) there are multiple implementations provided depending on your needs, please see SolrPlugins for information on how to ensure that your own custom Field Types can be loaded into Solr.

Common options that field types can have are...
sortMissingLast=true|false
sortMissingFirst=true|false
indexed=true|false
stored=true|false
multiValued=true|false
omitNorms=true|false
omitTermFreqAndPositions=true|false  Solr1.4
omitPositions|false  Solr3.4
positionIncrementGap=N
TextFields can also support Analyzers with highly configurable Tokenizers and Token Filters.

Field types that store text (TextField, StrField) support compression of stored contents:

compressed=true|false
compressThreshold=<integer>
compressThreshold is the minimum length required for text compression to be invoked. This applies only if compressed=true; a common pattern is to set compressThreshold on the field type definition, and turn compression on and off in the individual field definitions.

  

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值