3. Solr的基本使用

最新推荐文章于 2024-04-07 11:35:15 发布

chunku3625

最新推荐文章于 2024-04-07 11:35:15 发布

阅读量76

点赞数

原文链接：https://my.oschina.net/maiomiao/blog/879999

版权

1. Schema.xml

schema.xml位于SolrCore的conf目录下，这个文件主要是配置域名及域的类型等信息

Solr中的域要先定义后使用

field:用来配置solr的域

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

name:域名
type:域的类型(也需要配置)
indexed:是否索引
stored:是否存储
required:是否必填，一般只有id才设置必填
multiValued:是否可以多值，如果设置多值的话，里面的数据采用数组的方式存储，比如一个商品的图片(大图、小图、高清图片等等)

dynamicField:配置动态域的名称

<dynamicField name="*_is" type="int"    indexed="true"  stored="true"  multiValued="true"/>

name:定义该域名的名称，该域的名称是通过一个表达式确定，比如本例子中可以匹配xxx_is
indexed:是否索引
stored:是否存储
multiValued:是否可以多值

uniqueKey：指定一个主键域，开始我们提交时必须要有id域，这是因为我们指定主键为id。每个文档中都应该有一个唯一主键

<uniqueKey>id</uniqueKey>

copyField:复制域,顾名思义，就是将源域中的内容复制到目标域中


<field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>

.....

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

.....

<copyField source="cat" dest="text"/>

source:源域
dest:目标域

fieldType:域的类型


<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

name:域的类型名称
class:指定solr的类型
positionIncrementGap:
analyzer:指定分词器
|---type="index":表示索引时使用的分词器
|---type="query":表示搜索时使用的分词器
tokenizer：分词器
filter:过滤器

2. 配置中文分词器

中文分词器有多个，这里我们使用iKanalyzer中文分词器

下载IKAnalyzer中文分词器(https://git.oschina.net/wltea/IK-Analyzer-2012FF/repository/archive/master),解压
拷贝中文分词器的jar包(IKAnalyzer2012FF_u1.jar，注意这个是对应的solr4版本)到项目目录下(${tomcathome}/webapps/solr/WEB-INF/lib),这个中文分词器的jar包，solr是没有提供的，需要我们自己去下(http://mvnrepository.com/)
复制IKAnalyzer的配置文件(IKAnalyzer.cfg.xml和stopword.dic)和定义词典和停用词典到${tomcathome}/webapps/solr/WEB-INF/lib下:
修改需要中文分词的solrCore的schema.xml文件，添加中文分词相关的域和域类型

	<!-- 配置中文分词器域类型 -->
	<fieldType name="text_ik" class="solr.TextField">   
		<analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>   
	</fieldType>  
	
	<!-- 配置标题和内容中文分词域 -->
	<field name="title_ik" type="text_ik" indexed="true" stored="true" multiValued="false" />  
	<field name="content_ik" type="text_ik" indexed="true" stored="true" multiValued="false" />

重启tomcat服务:在Analysis的域名和域类型选项框中出现了我们上面配置的两个域名和一个域类型。

3. 业务域名的配置

要使用solr实现电商网站中的商品搜索，电商中的商品信息在mysql数据库中存储了，将mysql数据库中数据在solr中创建索引，需要在solr的schema.xml文件定义商品filed。

products表结构

定义商品各个字段的域

1. 商品pid:schema文件已经有主见id了，就不需要对它再进行配置了

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

1. 商品标题name:需要使用分词

<field name="product_name" type="text_ik" indexed="true" stored="true" />

1. 商品分类catalog:

<field name="product_catalog" type="string" indexed="true" stored="true" />

1. 商品分类名称catname:

<field name="product_catalog_name" type="text_ik" indexed="true" stored="true" />

1. 商品价格price:

<field name="product_price" type="float" indexed="true" stored="true" />

1. 商品描述description:

<field name="product_description" type="text_ik" indexed="true" stored="false" />

1. 商品图片pricture:

<field name="product_pricture" type="string" indexed="true" stored="true" />

复制域:在搜索值的时候，通常我们是商品名称和描述一起搜索，所以需要一个域保存商品名称和描述

<!-- 先定义一个关键字字段，保存商品名称和描述字段 -->

<field name="product_keywords" type="text_ik" indexed="true" stored="false"  multiValued="true" />


<copyField source="product_name" dest="product_keywords"/>
<copyField source="product_description" dest="product_keywords"/>

4. DataimportHandler插件

DataimportHandler可以把数据从数据库中查询出来，然后导入到索引库中

直接将solr-dataimporthandler-4.10.4.jar(位于${solrinstall}/dist目录下)和mysql的驱动包拷贝的tomcat的webapp/solr/WEB-INF/lib目录,当然数据库驱动包不要忘了拷贝到这个目录来
配置DataimportHandler:在solrconfi.xml文件中，配置dataimporthandler

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
	<lst name="defaults">
		<str name="config">data-config.xml</str>
	</lst>
  </requestHandler>

添加data-config.xml:第三部中定义了config的为data-config.xml，这个文件要建立在solrCore的conf目录下。内容如下:

<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
	<dataSource type="JdbcDataSource"
		driver="com.mysql.jdbc.Driver" 
		url="jdbc:mysql://127.0.0.1:3306/taotao"
		user="root"
		password="123456"/>
	<document name="products_DOC">
      <entity name="products"  query="SELECT pid,name,catalog,catalog_name,price,description,pricture FROM products" >
       <field name="id" column="pid" />
       <field name="product_name" column="name" />
       <field name="product_catalog" column="catalog" />
       <field name="product_catalog_name" column="catalog_name" />
       <field name="product_price" column="price" />
       <field name="product_description" column="description" />
       <field name="product_pricture" column="pricture" />
     </entity>
   </document>
</dataConfig>

dataSource配置的是数据源的信息，下面document配置的是从数据库中要导入到solrCore索引的内容，filed的name就是我们前面schema.xml文件中配置的域

重启tomcat，然后使用默认的导入方式

转载于:https://my.oschina.net/maiomiao/blog/879999

chunku3625

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
3. Solr的基本使用

1. Schema.xmlschema.xml位于SolrCore的conf目录下，这个文件主要是配置域名及域的类型等信息Solr中的域要先定义后使用field:用来配置solr的域<field name="id" type="string" indexed="tru...
复制链接

扫一扫