solr

The begining

There is one thing that you must know – Suggest component is not available in Solr version 1.4.1 and below. To start using this component you need to download 3_x or trunk version from Lucene/Solr SVN.

Configuration
Before we get into the index configuration we need to define an search component. So let’s do it:

view sourceprint?
1.
<searchComponent name="suggest" class="solr.SpellCheckComponent">
2.
<lst name="spellchecker">
3.
<str name="name">suggest</str>
4.
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
5.
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
6.
<str name="field">name_autocomplete</str>
7.
</lst>
8.
</searchComponent>
It is worth mentioning that suggest component is based on solr.SpellCheckComponent and that’s why we can use the above configuration. We have three important attributes in the configuration:

name - name of the component.
lookupImpl – an object that will handle the search. At this point we have two possibilities to use – JasperLookup or TSTLookup. This second one characterizes greater efficiency.
field – the field on the basis of which suggestions are generated.
Now let’s add the appropriate handler:

view sourceprint?
01.
<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
02.
<lst name="defaults">
03.
<str name="spellcheck">true</str>
04.
<str name="spellcheck.dictionary">suggest</str>
05.
<str name="spellcheck.count">10</str>
06.
</lst>
07.
<arr name="components">
08.
<str>suggest</str>
09.
</arr>
10.
</requestHandler>
Quite simple configuration, which defines a handler with an additional search component and tell Solr that the maximum number of suggestions returned is 10, this it should use dictionary named suggest (which is actually a Suggest component) which is exactly the same as our defined component.

Index
Let us assume that our document consists of three fields: id, name and description. We want to generate suggestions on the field that hold the name of the product. Our index could look like this:

view sourceprint?
1.
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
2.
<field name="name" type="text" indexed="true" stored="true" multiValued="false" />
3.
<field name="name_autocomplete" type="text_auto" indexed="true" stored="true" multiValued="false" />
4.
<field name="description" type="text" indexed="true" stored="true" multiValued="false" />
In addition, there is the following copy field definition:

view sourceprint?
1.
<copyField source="name" dest="name_autocomplete" />
Suggesting single words
In order to achieve individual words suggestions text_autocomplete type should be defined as follows:

view sourceprint?
1.
<fieldType class="solr.TextField" name="text_auto" positionIncrementGap="100">
2.
<analyzer>
3.
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
4.
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
5.
<filter class="solr.LowerCaseFilterFactory"/>
6.
</analyzer>
7.
</fieldType>
Suggesting phrases
To implement the entire phrase suggestions our text_autocomplete type should be defined as follows:

view sourceprint?
1.
<fieldType class="solr.TextField" name="text_auto">
2.
<analyzer>
3.
<tokenizer class="solr.KeywordTokenizerFactory"/>
4.
<filter class="solr.LowerCaseFilterFactory"/>
5.
</analyzer>
6.
</fieldType>
If you want to use phrases you may want to define your own query converter.

Dictionary building
Before we start using the component, we need to build its index. To this send the following command to Solr:

view sourceprint?
1.
/suggest?spellcheck.build=true
Queries
Now we come to use of the component. In order to show how the use the component, I decided suggest whole phrases. The example query could look like that:

view sourceprint?
1.
/suggest?q=har
After running that query I got the following suggestions:

view sourceprint?
01.
<?xml version="1.0" encoding="UTF-8"?>
02.
<response>
03.
<lst name="responseHeader">
04.
<int name="status">0</int>
05.
<int name="QTime">0</int>
06.
</lst>
07.
<lst name="spellcheck">
08.
<lst name="suggestions">
09.
<lst name="dys">
10.
<int name="numFound">4</int>
11.
<int name="startOffset">0</int>
12.
<int name="endOffset">3</int>
13.
<arr name="suggestion">
14.
<str>hard drive</str>
15.
<str>hard drive samsung</str>
16.
<str>hard drive seagate</str>
17.
<str>hard drive toshiba</str>
18.
</arr>
19.
</lst>
20.
</lst>
21.
</lst>
22.
</response>
The end
In the next part of the autocomplete functionality I’ll show how to modify its configuration to use static dictionary into the mechanism and how this can helk you get better suggestions. The last part of the series will be a performance comparison of each method in which I’ll try to diagnose which method is the fastest one in various situations.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值