solr搜索应用

最新推荐文章于 2022-10-15 11:35:10 发布

hyp1204

最新推荐文章于 2022-10-15 11:35:10 发布

阅读量490

点赞数 1

分类专栏： solr-应用

本文链接：https://blog.csdn.net/huoyunp/article/details/79081878

版权

本文详细介绍了Apache Solr的安装配置过程，包括如何解决启动时的依赖问题，配置Solrhome，创建和管理core，以及核心配置文件的理解。此外，还深入探讨了如何配置中文分词器IKAnalyzer以支持中文搜索，并提供了使用Solrj Java客户端操作索引的示例代码。

摘要由CSDN通过智能技术生成

什么是solr

Solr 是Apache下的一个顶级开源项目，采用Java开发，它是基于Lucene的全文搜索服务器。Solr提供了比Lucene更为丰富的查询语言，同时实现了可配置、可扩展，并对索引、搜索性能进行了优化。
Solr可以独立运行，运行在Jetty、Tomcat等这些Servlet容器中，Solr 索引的实现方法很简单，用 POST 方法向 Solr 服务器发送一个描述 Field 及其内容的 XML 文档，Solr根据xml文档添加、删除、更新索引。Solr 搜索只需要发送 HTTP GET 请求，然后对 Solr 返回Xml、json等格式的查询结果进行解析，组织页面布局。Solr不提供构建UI的功能，Solr提供了一个管理界面，通过管理界面可以查询Solr的配置和运行情况。
提供了比lucene更丰富的查询语言，是一个高性能，高可用环境全文搜索引擎。

1> solr安装配置

1. 下载solr安装包 solr所有版本（http://archive.apache.org/dist/lucene/solr/）
这里下载 solr-5.5.4

2. 安装解压将解压好的solr-5.5.4\server\solr-webapp下的webapp 拷贝到tomcat\webapps目录下改名为solr 启动tomcat

直接访问出现404
这里写图片描述

找到tomcat/logs/localhost.2017-08-17.log 日志出现以下异常

java.lang.NoClassDefFoundError: Failed to initialize Apache Solr: Could not find necessary SLF4j logging jars.   
If using Jetty, the SLF4j logging jars need to go in the jetty lib/ext directory. For other containers,   
the corresponding directory should be used. For more information, see: http://wiki.apache.org/solr/SolrLogging  
    at org.apache.solr.servlet.CheckLoggingConfiguration.check(CheckLoggingConfiguration.java:27)  
    at org.apache.solr.servlet.BaseSolrFilter.<clinit>(BaseSolrFilter.java:30)

可用看到缺少SLF4j包应该去应该去解压包 /server/lib/ext下找到并拷贝到件tomcat-8.0.44\webapps\solr\WEB-INF\lib目录下然后重启
继续访问出现以下错误
是因为需要配置solrhome和solrhome的配置环境

这里写图片描述

tomcat6不支持solr5.54 加大tomcat版本 tomcat7也不支持换成tomcat8

配置solrhome

找到 tomcat\solr\WEB-INF\web.xml 编辑找到以下这段（配置solrhome）去掉注释将第二个参数配置为本地任意一个目录即可

 <env-entry>
   <env-entry-name>solr/home</env-entry-name>
     <env-entry-value>E:/solrhome</env-entry-value>
     <env-entry-type>java.lang.String</env-entry-type>
  </env-entry>

找到solr解压包/server/solr目录拷贝所有文件到以上web.xml指定的路径E:/solrhome下重启tomcat 访问

http://localhost:8080/solor/index.html 或者 http://localhost:8080/solr/admin.html
这里写图片描述
配置core（core类似于数据库可以插入多个document（数据库表行）每个document拥有多个 field 数据库的列）

solrhome下新建一个core1目录比如core1

拷贝 solr解压包下\server\solr\configsets\basic_configs到新建目录 core1中

进入solr管理网页点击 core admin 添加该core

这里写图片描述

点击Add core后成功后检查 mycore目录发现多了 core.properties和data两个资源

这里写图片描述

登录solr管理网站发现列表中多了core1
这里写图片描述

配置文件理解

core1/conf目录下的两个配置文件非常重要 managed-schema 和solrconfig.xml

定义字段 _version_  type类型为long  indexed="true" 会进行分词索引  stored="true"表示存储到磁盘  
 <field name="_version_" type="long" indexed="true" stored="true"/>

动态字段：允许通过通配符来预先定义，字段的名字
例如： *_i a_i

所有_i结尾的字段都可以写入到当前的core
 <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>

定义唯一标识符的字段  
<uniqueKey>id</uniqueKey>  
定义字段类型的别名  
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

solrconfig.xml 主要用于配置solor的主要配置信息

表示lucene版本  
<luceneMatchVersion>5.5.4</luceneMatchVersion>  
表示数据目录 默认是data目录  
<dataDir>${solr.data.dir:}</dataDir>   
自动提交配置  
<autoCommit>   
       当超过15000ms后自动提交所有数据  
       <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>   
       是否马上就可以查询到  
       <openSearcher>false</openSearcher>   
</autoCommit>  
表示当路径为 /select时查询所有的数据  
<requestHandler name="/select" class="solr.SearchHandler">  
    <!-- default values for query parameters can be specified, these  
         will be overridden by parameters in the request  
      -->  
     <lst name="defaults">  
       <str name="echoParams">explicit</str>  
       <int name="rows">10</int>  
     </lst>  
</requestHandler>

尝试在界面上添加数据和查询数据
插入document
这里写图片描述

查询结果
这里写图片描述

查询的参数列表
q表示查询的条件字段名：值的格式
fq表示filter query 过滤条件和q是and的关系支持各种逻辑运算符（参考https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser）
sort表示排序的字段字段名 asc|desc
start 表示从第几行开始 rows表示查询的总行数
fl表示查询显示的列比如只需要查询 name_s,sex_i 这两列使用,隔开
df表示默认的查询字段一般不设置
Raw Query Parameters表示原始查询字段可以使用 start=0&rows=10这种url的方式传入参数
wt（write type）表示写入的格式可以使用json和xml

分词查询结果

这里写图片描述

配置中文分词器
默认solr 没有使用中文分词器所有搜索的词都是整个句子就是一个词搜索时将单词全部写入才能搜索或者使用* 需要配置中文分词器

目前比较好用的分词器是IK 只支持到 Lucene4.7 所有 solr5.5 需要lucene5支持需要修改部分源码来支持solr5.5
找到 IKAnalyzer类需要重写 protected TokenStreamComponents createComponents(String fieldName, final Reader in)方法

找到 IKTokenizer类需要重写构造方法 public IKTokenizer(Reader in, boolean useSmart) 为 public IKTokenizer(boolean useSmart) 方法

在项目中添加完整的包名和类名和 ik中一致拷贝源代码
这里写图片描述

pom.xml配置

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.et</groupId>
  <artifactId>IK</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
    <!--IK分词器的配置 4.7 -->
    <dependency>
        <groupId>com.janeluo</groupId>
        <artifactId>ikanalyzer</artifactId>
        <version>2012_u6</version>
        <!--  让lucene兼容5.5.4 要排除lucene的核心包 -->
         <exclusions>  
             <exclusion>  
                 <groupId>org.apache.lucene</groupId>  
                  <artifactId>lucene-core</artifactId>  
              </exclusion>
              <exclusion>  
                  <groupId>org.apache.lucene</groupId>  
                   <artifactId>lucene-queries</artifactId>  
              </exclusion>  
               <exclusion>  
                  <groupId>org.apache.lucene</groupId>  
                   <artifactId>lucene-sandbox</artifactId>  
              </exclusion>
               <exclusion>  
                  <groupId>org.apache.lucene</groupId>  
                   <artifactId>lucene-queryparser</artifactId>  
              </exclusion>
              <exclusion>  
                  <groupId>org.apache.lucene</groupId>  
                   <artifactId>lucene-analyzers-common</artifactId>  
              </exclusion>
          </exclusions>  
    </dependency>
    <!-- 排除后引入新的jar包 -->
    <dependency>
        <groupId>org.apache.lucene</groupId>  
        <artifactId>lucene-core</artifactId> 
        <version>5.5.4</

最低0.47元/天解锁文章

hyp1204

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
solr搜索应用

什么是solrSolr 是Apache下的一个顶级开源项目，采用Java开发，它是基于Lucene的全文搜索服务器。Solr提供了比Lucene更为丰富的查询语言，同时实现了可配置、可扩展，并对索引、搜索性能进行了优化。 Solr可以独立运行，运行在Jetty、Tomcat等这些Servlet容器中，Solr 索引的实现方法很简单，用 POST 方法向 Solr 服务器发送一个描述 Fi
复制链接

扫一扫

专栏目录