Solr 同义词搜索 synonyms

最新推荐文章于 2018-10-08 23:42:37 发布

熊猫家族

最新推荐文章于 2018-10-08 23:42:37 发布

阅读量285

点赞数

分类专栏： Solr 文章标签： jira

本文链接：https://blog.csdn.net/a280606790/article/details/84150998

版权

Solr 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

个人技术博客：http://demi-panda.com

solr.SynonymFilterFactory

Creates SynonymFilter .

Matches strings of tokens and replaces them with other strings of tokens.

The synonyms parameter names an external file defining the synonyms.
If ignoreCase is true, matching will lowercase before checking equality.
If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.
The optional tokenizerFactory parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319 ), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 .

Example usage in schema:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.ChineseTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" 
           expand="true" tokenizerFactory="solr.ChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
            catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.ChineseTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" 
            expand="true" tokenizerFactory="solr.ChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
            catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
</fieldType>

# blank lines and lines starting with pound are comments.

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
#-----------------------------------------------------------------------
#some test synonym mappings unlikely to appear in real input text
aaafoo => aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa

# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs 
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming
#after us won't split it into two words.
飞利浦刮胡刀,飞利浦剃须刀

# Synonym mappings can be used for spelling correction too
pixima => pixma

a\,a => b\,b

熊猫家族

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Solr 同义词搜索 synonyms

个人技术博客：http://demi-panda.comsolr.SynonymFilterFactory Creates SynonymFilter. Matches strings of tokens and replaces them with other strings of tokens. The synonyms parameter...
复制链接

扫一扫

专栏目录