改功能实现主要参考了该blog
[url]http://lucien-zzy.iteye.com/admin/blogs/2008291[/url]
-----------------------------------------------------------------------------------
实现suggest的基本原理是:
在对document建立索引时,将比较多的文本字段copy到一个text字段,
<copyField source="title" dest="text"/>
<copyField source="author" dest="text"/>
<copyField source="description" dest="text"/>
<copyField source="keywords" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="content_type" dest="text"/>
然后,对text字段的词元单独建立字典索引,在solr自带的SpellCheckComponent建立的字典索引中,只有一个字段word,对于英文的处理是够了,但是对中文的拼音和首字母sugeest却是不够的,所以作者引入了另外一个字段key,索引了来自于每个text的词元word的拼音和首字母,
如词元:国防部
----------------------------------
国
国防
国防部
g
gf
gfb
...
guofangbu
-----------------------------------
Document doc = new Document();
Field contents = new StringField("word", word, Field.Store.YES);
doc.add(contents);
LOG.info("添加word:"+word);
Field pys = new TextField("key", word, Field.Store.NO);
doc.add(pys);
在Suggester查询时就是查询这个词典索引的key字段,将word字段作为展示返回
String queryString = "key:" + key;
LOG.info("查询:"+queryString);
Query query = parser.parse(queryString);
TopDocs results = null;
results = searcher.search(query, num);
具体配置如下:-------------------------------------------------------------------
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">string</str>
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">text</str>
<str name="fieldType">string</str>//可以指定搜索词元前对关键字的分词方式
<float name="threshold">0.0001</float>
<str name="spellcheckIndexDir">spellchecker</str>
<str name="comparatorClass">freq</str>
<!--<str name="buildOnOptimize">true</str>-->
<str name="buildOnCommit">true</str>
</lst>
<!--用于输入拼音提示功能-->
<lst name="spellchecker">
<str name="name">pysuggest</str>
<str name="classname">shentong.tsearch.spelling.suggest.Suggester</str>
<str name="lookupImpl">shentong.tsearch.spelling.suggest.py.PYLookup</str>
<str name="field">text</str>
<str name="fieldType">string</str>//可以指定搜索词元前对关键字的分词方式
<float name="threshold">0.0001</float>
<str name="pySuggestIndexDir">suggestIndex</str>
<str name="comparatorClass">freq</str>
<!--<str name="buildOnOptimize">true</str>-->
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">pysuggest</str>
<!-- 这个参数告诉solr,当查询的结果数多于设定的count数时,返回点击数更多的那些 -->
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
----------------------------------------------------------------------------------
看效果图如下:
[img]http://dl2.iteye.com/upload/attachment/0102/6496/d2b27334-2fae-3b6e-b8ef-01de28c224d8.bmp[/img]
[img]http://dl2.iteye.com/upload/attachment/0102/6498/3aaff67c-8f79-3803-9a31-26234365e174.bmp[/img]
[img]http://dl2.iteye.com/upload/attachment/0102/6500/7539d8ea-9a4b-3a0e-aee1-e4348dd5a8e1.bmp[/img]
[img]http://dl2.iteye.com/upload/attachment/0102/6502/3475ed2f-0238-3a17-8057-82ba784e9c88.bmp[/img]
[url]http://lucien-zzy.iteye.com/admin/blogs/2008291[/url]
-----------------------------------------------------------------------------------
实现suggest的基本原理是:
在对document建立索引时,将比较多的文本字段copy到一个text字段,
<copyField source="title" dest="text"/>
<copyField source="author" dest="text"/>
<copyField source="description" dest="text"/>
<copyField source="keywords" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="content_type" dest="text"/>
然后,对text字段的词元单独建立字典索引,在solr自带的SpellCheckComponent建立的字典索引中,只有一个字段word,对于英文的处理是够了,但是对中文的拼音和首字母sugeest却是不够的,所以作者引入了另外一个字段key,索引了来自于每个text的词元word的拼音和首字母,
如词元:国防部
----------------------------------
国
国防
国防部
g
gf
gfb
...
guofangbu
-----------------------------------
Document doc = new Document();
Field contents = new StringField("word", word, Field.Store.YES);
doc.add(contents);
LOG.info("添加word:"+word);
Field pys = new TextField("key", word, Field.Store.NO);
doc.add(pys);
在Suggester查询时就是查询这个词典索引的key字段,将word字段作为展示返回
String queryString = "key:" + key;
LOG.info("查询:"+queryString);
Query query = parser.parse(queryString);
TopDocs results = null;
results = searcher.search(query, num);
具体配置如下:-------------------------------------------------------------------
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">string</str>
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">text</str>
<str name="fieldType">string</str>//可以指定搜索词元前对关键字的分词方式
<float name="threshold">0.0001</float>
<str name="spellcheckIndexDir">spellchecker</str>
<str name="comparatorClass">freq</str>
<!--<str name="buildOnOptimize">true</str>-->
<str name="buildOnCommit">true</str>
</lst>
<!--用于输入拼音提示功能-->
<lst name="spellchecker">
<str name="name">pysuggest</str>
<str name="classname">shentong.tsearch.spelling.suggest.Suggester</str>
<str name="lookupImpl">shentong.tsearch.spelling.suggest.py.PYLookup</str>
<str name="field">text</str>
<str name="fieldType">string</str>//可以指定搜索词元前对关键字的分词方式
<float name="threshold">0.0001</float>
<str name="pySuggestIndexDir">suggestIndex</str>
<str name="comparatorClass">freq</str>
<!--<str name="buildOnOptimize">true</str>-->
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">pysuggest</str>
<!-- 这个参数告诉solr,当查询的结果数多于设定的count数时,返回点击数更多的那些 -->
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
----------------------------------------------------------------------------------
看效果图如下:
[img]http://dl2.iteye.com/upload/attachment/0102/6496/d2b27334-2fae-3b6e-b8ef-01de28c224d8.bmp[/img]
[img]http://dl2.iteye.com/upload/attachment/0102/6498/3aaff67c-8f79-3803-9a31-26234365e174.bmp[/img]
[img]http://dl2.iteye.com/upload/attachment/0102/6500/7539d8ea-9a4b-3a0e-aee1-e4348dd5a8e1.bmp[/img]
[img]http://dl2.iteye.com/upload/attachment/0102/6502/3475ed2f-0238-3a17-8057-82ba784e9c88.bmp[/img]