solr和Lucene的配置方式和应用

最新推荐文章于 2024-07-28 19:41:52 发布

Licht1988

最新推荐文章于 2024-07-28 19:41:52 发布

阅读量105

点赞数

文章标签： java 数据库

原文链接：http://www.cnblogs.com/AngeLeyes/p/7337931.html

版权

solr字段类型

类	说明
BinaryField	二进制数据
BoolField	布尔值，其中’t’/’T’/’1’都是`true`
CollationFiled	支持Unicode排序
CurrencyField	支持货币和汇率
DateRangeFiled	支持索引date range
ExternamFiledFiled	pull磁盘上的文件
EnumField	支持定义枚举值
ICUCollationField	支持Unicode排序
LatLonType	支持经度、纬度坐标
PointType	支持任意的n维向量，可以用于检索CAD图数据
PreAnalyzedField	用于支持序列化数据
RandomSortField	结果随机排序
SpatialRecursivePrefixTreeFieldType	支持经纬度字符串或WKT格式
StrField	字符串，不分词，并且小于32k
TextField	文本字段,既分词又索引
TrieDateField	日期类型字段
TrieDoubleField	双精度类型字段
TrieField	使用时必须指定”type”属性（integer,long,double,date）
TrieFloatField	单精度类型字段
TrieIntField	整数字段
TrieLongField	长整形字段
UUIDField	Universally Unique Identifier通用唯一识别码

Date fields

<fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
precisionStep="6" positionIncrementGap="0"/>

当你要用到日期的范围查询时，最好用这样的数据类型，而且在提交文档时日期的格式有特别要求，日期里有T,Z字符，比如有个timestamp字段设置为tdate类型了，那么提交时应该

<add>
<doc>
...
<field name="timestamp">2012-05-22T09:30:22Z</field>
...
</doc>
</add>

不需要查询精确到分秒级别时

<field name="timestamp">2016-05-22T09:30:22Z/HOUR</field>

域类型的定义说明(fieldType)

一个solr自带的域类型

打开自己配置的SolrCore的collection1\conf\schema.xml

<!--  FieldType子结点包括：name,class,positionIncrementGap等一些参数：

name：是这个FieldType的名称

class：是Solr提供的包solr.TextField，solr.TextField 允许用户通过分析器来定制索引和查询，分析器包括一个分词器（tokenizer）和多个过滤器（filter）

positionIncrementGap：可选属性，定义在同一个文档中此类型数据的空白间隔，避免短语匹配错误，此值相当于Lucene的短语查询设置slop值，根据经验设置为100。

在FieldType定义的时候最重要的就是定义这个类型的数据在建立索引和进行查询的时候要使用的分析器analyzer,包括分词和过滤

索引分析器中：使用solr.StandardTokenizerFactory标准分词器，solr.StopFilterFactory停用词过滤器，solr.LowerCaseFilterFactory小写过滤器。
搜索分析器中：使用solr.StandardTokenizerFactory标准分词器，solr.StopFilterFactory停用词过滤器，这里还用到了solr.SynonymFilterFactory同义词过滤器。


-->


<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  
  <!--索引分析器 -->
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
  
   <!--搜索分析器 -->
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
  
    </fieldType>

自定义Field(域)

打开自己配置的SolrCore的collection1\conf\schema.xml

添加以下内容,就可以使用这个域了

<!-- 自定义field 
filed定义包括,
name 域对象
type（为之前定义过的各种FieldType）,
indexed（是否被索引）,
stored（是否被储存），
multiValued（是否存储多个值）等属性
multiValued：该Field如果要存储多个值时设置为true，solr允许一个Field存储多个值，比如存储一个用户的好id（多个），商品的图片（多个，大图和小图）
required 必须存在,一般用在主键id
-->
<field name="myname" type="text_general" indexed="true" stored="true" multiValued="true" required="true"  />

copy域

运用领域:

name:nikeT恤

描述:一件运动T恤

当需要只搜索一次两个域的任意内容时,

就能把两个东西都搜索出来的时候

就把多个域放到一个拷贝域里面

   <copyField source="cat" dest="text"/>
   <copyField source="name" dest="text"/>
   <copyField source="manu" dest="text"/>
   <copyField source="title" dest="text"/>

查询

过滤条件查询

product_price:[1 TO 10]

过滤查询价格从1到10的商品
也可以使用“*”表示无限，例如：
20以上：product_price:[20 TO *]
20以下：product_price:[* TO 20]

降序升序查询

product_price desc
product_price asc

更新/修改

{"id":"change.me","title":"change.me"}

删除

根据id删除

<delete>
<id>change.me</id>
</delete>
<commit/>

### 条件删除

<delete>
<query>id:change.me</query>
</delete>
<commit/>

多值字段

必须先定义 multiValued="true" 的类型的域才可以插入多值

{"id":"change.me","title":"6666,777,888"}

定义中文分词器

1 把IKAnalyzer2012FF_u1.jar拷贝到\webapps\solr\WEB-INF\lib下

2 把ik分词器的3个配置文件拷贝到webapps\solr\WEB-INF\classes下

3 在schema.xml中添加自定义分词器

<!-- IKAnalyzer 中文分词器-->
   <fieldType name="text_ik" class="solr.TextField">
      <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
   </fieldType>

<!--把IK中文分词器添加到一个域-->
   <field name="title_ik" type="text_ik" indexed="true" stored="true" />
   <field name="content_ik" type="text_ik" indexed="true" stored="false" multiValued="true"/>

导入

批量导入SQL的数据

1.在自定义的solrcore\collection1\lib下面添加dataimport插件依赖的jar和sql的驱动jar

2.根据数据库字段,字段类型等,在中定义字段在solr中存在的形态,比如是否索引,是否分词等

<!--product字段的声明-->
<!-- 根据实际情况来确定是否需要被索引,是否被储存,和选择不同的类型(type)来确定分词-->
   <field name="product_name" type="text_ik" indexed="true" stored="true"/>
   <field name="product_price"  type="float" indexed="true" stored="true"/>
   <field name="product_description" type="text_ik" indexed="true" stored="false" />
   <field name="product_picture" type="string" indexed="false" stored="true" />
   <field name="product_catalog_name" type="string" indexed="true" stored="true" />

   <field name="product_keywords" type="text_ik" indexed="true" stored="false" multiValued="true"/>

   <copyField source="product_name" dest="product_keywords"/>
   <copyField source="product_description" dest="product_keywords"/>

3.在collection1\conf\下的solrconfig.xml文件，添加一个requestHandler

 <requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">data-config.xml</str>
     </lst>
  </requestHandler>

4.创建一个data-config.xml，保存到collection1\conf\目录下

<?xml version="1.0" encoding="UTF-8" ?>  
<dataConfig>   
<dataSource type="JdbcDataSource"   
          driver="com.mysql.jdbc.Driver"   
          url="jdbc:mysql://localhost:3306/solr"   
          user="root"   
          password="root"/>   
<document>   
    <entity name="product" query="SELECT pid,name,catalog_name,price,description,picture FROM products ">
         <field column="pid" name="id"/> 
         <field column="name" name="product_name"/> 
         <field column="catalog_name" name="product_catalog_name"/> 
         <field column="price" name="product_price"/> 
         <field column="description" name="product_description"/> 
         <field column="picture" name="product_picture"/> 
    </entity>   
</document>   
</dataConfig>

5.重启Tomcat,进入solr主页 ,选中SolrCore,就是collection1,查看DataImport点击Execute执行导入,导入后可在query里面查看

solr实例(solrcore)截图和solr服务器截图

一个solrhome可以包含多个solecore就是多个collection1

solrj

solrj是java客户端访问solr服务的工具,可以在java代码中实现搜索

参考自己写的一个demo:

转载于:https://www.cnblogs.com/AngeLeyes/p/7337931.html

Licht1988

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
solr和Lucene的配置方式和应用

solr字段类型类说明BinaryField二进制数据BoolField布尔值，其中’t’/’T’/’1’都是trueCollationFiled支持Unicode排序CurrencyField支持货币和汇率DateRangeFiled支持索引date rangeExternamFiledFiledpull磁盘上的文件E...
复制链接

扫一扫