Solr简介

Apache Solr简介(xiangjiang5011@163.com

Apache solr官方查看地址:http://lucene.apache.org/solr/

What Is Solr?

Solr is the popular, blazing fast open source enterprise searchplatform from the Apache Lucene project. Its major features includepowerful full-text search, hit highlighting, faceted search, dynamicclustering, database integration, rich document (e.g., Word, PDF) handling, andgeospatialsearch. Solr is highly scalable, providing distributedsearch and index replication, and it powers the search and navigation featuresof many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text searchserver within a servlet container such asTomcat. Solr uses theLucene Java search library at its core for full-text indexing and search, andhas REST-like HTTP/XML and JSON APIs that make it easy to use from virtuallyany programming language. Solr's powerful external configuration allows it tobe tailored to almost any type of application without Java coding, and it hasan extensive plugin architecture when more advanced customization is required.

solr目前已经更新到3.5的版本可能由于中文分词器支持的原因(很多中文分词器并不支持solr的高版本),因此企业内部用的最多的还是1.4.1版本的solr,几乎支持所有的中分分词器,为了设计到solr的各方各面,已经公司的需要,因此,此次都是关于solr1.4.1相关的学习

windowns系统下在tomcat中安装单个solr

推荐学习地址:http://wiki.apache.org/solr/SolrTomcat

1.准备目录环境

新建目录D://solrworkspace/

下载solr1.4.1版本到目录D://solrworkspace/apache-solr-1.4.1.zip

下载tomcat6.0版本到目录D://solrworkspace/apache-tomcat-6.0.30.zip

解压D://solrworkspace/apache-solr-1.4.1.zip到当前目录为D://solrworkspace/apache-solr-1.4.1($APACHE_SOLR_HOME)

解压D://solrworkspace/apache-tomcat-6.0.30.zip到当前目录为D://solrworkspace/apache-tomcat-6.0.30($TOMCAT_HOME)

新建目录 D://solrworkspace/solr($SOLR_HOME)

新建目录($TOMCAT_HOME)/conf/Catalina/localhost

2.jar包部署

$APACHE_SOLR_HOME/dist/apache-solr-1.4.1.war copy到路径为D://solrworkspace/apache-solr-1.4.1.war 并改名为D://solrworkspace/solr.war

$APACHE_SOLR_HOME/dist/apache-solr-*.jar copy到目录$TOMCAT_HOME\lib

3.配置文件的修改

修改$TOMCAT_HOME/conf/tomcat-users.xml 添加

<role rolename="manager"/>
<role rolename="admin"/>
<user username="tomcat" password="tomcat" roles="manager,admin"/>
 
添加对中文的支持 $TOMCAT_HOME/conf/server.xml
<Server >
 <Service >
   <Connector port="8080" protocol="HTTP/1.1" 
 connectionTimeout="20000" 
 redirectPort="8443" URIEncoding="UTF-8" /> 
    ...
   </Connector>
 </Service>
</Server>

目录($TOMCAT_HOME)/conf/Catalina/localhost 新建solr.xml

修改solr.xml内容为

<?xml version="1.0" encoding="UTF-8"?>
<Context docBase="D:\solrworkspace\solr.war" debug="0" crossContext="true" >
<Environment name="solr/home" type="java.lang.String" value="D:\solrworkspace\solr" override="true" />
</Context>

4.solr配置文件的部署

copy ($APACHE_SOLR_HOME)/example/solr目录下所有文件到$SOLR_HOME目录

5.启动tomcat

($TOMCAT_HOME)/bin/startup.bat

6.访问solr管理界面 http://localhost:8080/solr

solr添加mmseg4j中文分词

mmseg4j-1.8.3版本支持solr1.4.1,当前最高版本版本mmseg1.8.5版本过高不支持solr1.4.1

在$SOLR_HOME目录下新建lib,dic两个文件夹

下载mmseg4j-1.8.3.zip到D:/solrworkspace/mmseg4j-1.8.3.zip

解压D:/solrworkspace/mmseg4j-1.8.3.zip 为D:/solrworkspace/mmseg4j-1.8.3($MMSEG_HOME)

复制$MMSEG_HOME/data 目录下 *.dic 到目录 $SOLR_HOME/dic目录下

复制$MMSEG_HOME/mmseg4j-all-1.8.3.jar 到目录$SOLR_HOME/lib目录下

修改$SOLR_HOME/config/schema.xml

复制

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true" />

......

</analyzer>

</analyzer>

</fieldType>

分别为

<fieldType name="text_mmseg_complex" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">

<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic"/>

<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true" />

......

</analyzer>

</fieldType>

<fieldType name="text_mmseg_max_word" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">

<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="dic"/>

<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true" />

......

</analyzer>

</fieldType>

<fieldType name="text_mmseg_simple" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">

<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="dic"/>

<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true" />

......

</analyzer>

</fieldType>

添加3条

<fields>

<field name="textMmsegComplex" type="text_mmseg_complex" indexed="true" stored="false"/>

<field name="textMmsegMaxWord" type="text_mmseg_max_word" indexed="true" stored="false"/>

<field name="textMmsegSimple" type="text_mmseg_simple" indexed="true" stored="false"/>

</fields>

访问http://localhost:8080/solr/admin/analysis.jsp

Filed 选择框选择 name 后面的输入框填写textMmsegComplex,textMmsegMaxWord,textMmsegSimple3种值,分别对应mmseg3中分词格式

Field value (Index) 被索引的分词词组,Field value (Query) 被查询的分词词组

后面的输入框输入你想要被分词的语句或词组

点击Analyze可以看到分词后被索引,和查询的结果

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值