solr应用

最新推荐文章于 2023-05-07 22:58:43 发布

xubin1623875795

最新推荐文章于 2023-05-07 22:58:43 发布

阅读量322

点赞数

分类专栏： solr

本文链接：https://blog.csdn.net/xubin1623875795/article/details/79085870

版权

solr 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一 .安装和配置solr

1.下载solr安装包 solr的所有的版本 http://archive.apache.org/dist/lucene/solr/ 这里下载的是solr-5.5.4.taz.

2. 解压solr-5.5.4.taz 解压后将solr-5.5.4\server\solr-webapp下的webapp目录拷贝到tomcat\webapps目录下并webapp目录改名为solr 启动tomcat. tomcat必须使用tomcat8或者tomcat8以上才支持,tomcat6 tomcat7也不支持已测试过。

直接访问出现404 找到tomcat\logs\localhost.2017-08-17.log 日志出现以下异常：

java.lang.NoClassDefFoundError: Failed to initialize Apache Solr: Could not find necessary SLF4j logging jars.   
If using Jetty, the SLF4j logging jars need to go in the jetty lib/ext directory. For other containers,   
the corresponding directory should be used. For more information, see: http://wiki.apache.org/solr/SolrLogging  
    at org.apache.solr.servlet.CheckLoggingConfiguration.check(CheckLoggingConfiguration.java:27)  
    at org.apache.solr.servlet.BaseSolrFilter.<clinit>(BaseSolrFilter.java:30)

可用看到缺少SLF4j包应该去solr-5.5.4\server\lib\ext下jar包找到并拷贝到 tomcat\solr\lib目录下然后重启:

继续访问出现以下错误：

org.apache.solr.common.SolrException: Error processing the request. CoreContainer is either not initialized or shutting down.  
    org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)  
    org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

是因为需要配置solrhome和solrhome的配置环境。

3.配置solrhome如下:

找到 tomcat\solr\WEB-INF\web.xml 编辑找到以下这段（配置solrhome）去掉注释将第二个参数配置为本地任意一个目录即可：

 <env-entry>
       <env-entry-name>solr/home</env-entry-name>
       <env-entry-value>D:/solrhome</env-entry-value>
       <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>

找到solr解压包solr-5.5.4\server\solr目录下拷贝所有文件到以上web.xml指定的路径D:/solrhome下重启tomcat 访问:

http://localhost:8080/solor/index.html 或者 http://localhost:8080/solr/admin.html

这里访问的是http://localhost:8080/solr/admin.html 出现以下界面:

4.配置core（core类似于数据库可以插入多个document（数据库表行）每个document拥有多个 field 数据库的列） solrhome下新建一个core目录比如mycore

拷贝 solr解压包下solr-5.5.4\server\solr\configsets\basic_configs下所有到新建目录 mycore中.

访问：http://localhost:8080/solr/admin.html 进入solr管理网页点击 core admin 添加该core.

点击Add core后成功后检查 mycore目录发现多了 core.properties和data两个资源.

5.理解配置文件如下：

core/conf目录下的两个配置文件非常重要 managed-schema 主要用于配置可以提交到该core的所有field定义，field的类型定义，唯一标识符等 solrconfig.xml 主要用于配置solor的主要配置信息比如lucene版本缓存数据目录请求路径映射等

常用配置如下：

managed-schema 配置理解如下:

定义字段 _version_ 类型为long  indexed="true" 会进行分词索引  stored="true"表示存储到磁盘  
<field name="_version_" type="long" indexed="true" stored="true"/>  
定义字段 id required="true" 表示所有的document必须添加id字段 multiValued="false" 表示是否是多值字段  
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />   
定义动态字段 所以_i结尾的字段都可以写入到当前的core  
<dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>  
定义唯一标识符的字段  
<uniqueKey>id</uniqueKey>  
定义字段类型的别名  
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

solrconfig.xml 配置理解如下:

表示lucene版本  
<luceneMatchVersion>5.5.4</luceneMatchVersion>  
表示数据目录 默认是data目录  
<dataDir>${solr.data.dir:}</dataDir>   
自动提交配置  
<autoCommit>   
       当超过15000ms后自动提交所有数据  
       <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>   
       是否马上就可以查询到  
       <openSearcher>false</openSearcher>   
</autoCommit>  
表示当路径为 /select时查询所有的数据  
<requestHandler name="/select" class="solr.SearchHandler">  
    <!-- default values for query parameters can be specified, these  
         will be overridden by parameters in the request  
      -->  
     <lst name="defaults">  
       <str name="echoParams">explicit</str>  
       <int name="rows">10</int>  
     </lst>  
</requestHandler>

访问：http://localhost:8080/solr/admin.html solr管理网站后发现列表中多了mycore

6.在界面上添加数据和查询数据.

添加数据:

成功插入,如下图:

查询数据:

7.理解查询的参数列表:

q表示查询的条件字段名：值的格式

fq表示filter query （过滤条件）和q是and的关系支持各种逻辑运算符相关参考 https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser

sort表示排序的字段字段名 asc|desc

start 表示从第几行开始 rows表示查询的总行数

fl表示查询显示的列比如只需要查询 name_s,sex_i 这两列使用,隔开

df表示默认的查询字段一般不设置

Raw Query Parameters表示原始查询字段可以使用 start=0&rows=10这种url的方式传入参数

wt（write type）表示写入的格式可以使用json和xml

8.配置中文分词器

默认solr 没有使用中文分词器所有搜索的词都是整个句子就是一个词搜索时将单词全部写入才能搜索或者使用* 需要配置中文分词器，目前比较好用的分词器是IK 2012年停更只支持到 Lucene4.7 所有 solr5.5 需要lucene5支持需要修改部分源码来支持solr5.5.

创建maven项目在pom.xml中引入lucene5 和 ik

<dependency>
			<groupId>com.janeluo</groupId>
			<artifactId>ikanalyzer</artifactId>
			<version>2012_u6</version>

			<!-- 排除不想要的jar包 -->
			<exclusions>
				<exclusion>
					<groupId>org.apache.lucene</groupId>
					<artifactId>lucene-core</artifactId>
				</exclusion>

				<exclusion>
					<groupId>org.apache.lucene</groupId>
					<artifactId>lucene-queryparser</artifactId>
				</exclusion>

				<exclusion>
					<groupId>org.apache.lucene</groupId>
					<artifactId>lucene-queries</artifactId>
				</exclusion>

				<exclusion>
					<groupId>org.apache.lucene</groupId>
					<artifactId>lucene-analyzers-common</artifactId>
				</exclusion>
			</exclusions>
		</dependency>

		<!-- 加入排除后的jar包，   可以解决jar包冲突问题 -->
		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-core</artifactId>
			<version>5.5.4</version>
		</dependency>

		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-queryparser</artifactId>
			<version>5.5.4</version>
		</dependency>

		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-queries</artifactId>
			<version>5.5.4</version>
		</dependency>

		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-analyzers-common</artifactId>
			<version>5.5.4</version>
		</dependency>

需要在项目中添加完整的包名和类名和 ik中一致拷贝源代码

找到 IKAnalyzer类需要重写 protected TokenStreamComponents createComponents(String fieldName) 方法

package org.wltea.analyzer.lucene;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Tokenizer;

public final class IKAnalyzer extends Analyzer {

  private boolean useSmart;

  public boolean useSmart() {
    return useSmart;
  }

  public void setUseSmart(boolean useSmart) {
    this.useSmart = useSmart;
  }

  
  public IKAnalyzer() {
    this(false);
  }

  public IKAnalyzer(boolean useSmart) {
    super();
    this.useSmart = useSmart;
  }

  /**
   * 这里就去掉了 Reader的一个参数
   */
  @Override
  protected TokenStreamComponents createComponents(String fieldName) {
    Tokenizer _IKTokenizer = new IKTokenizer(this.useSmart());
    return new TokenStreamComponents(_IKTokenizer);
  }

}

找到 IKTokenizer类需要重写构造方法 public IKTokenizer(Reader in, boolean useSmart) 为 public IKTokenizer(boolean useSmart)

package org.wltea.analyzer.lucene;

import java.io.IOException;

import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
import org.wltea.analyzer.core.IKSegmenter;
import org.wltea.analyzer.core.Lexeme;

public final class IKTokenizer extends Tokenizer {

  private IKSegmenter _IKImplement;

  private final CharTermAttribute termAtt;
  
  private final OffsetAttribute offsetAtt;
  
  private final TypeAttribute typeAtt;
  
  private int endPosition;
  
 //去掉了其中Reader的第一个构造参数 
  public IKTokenizer( boolean useSmart) {
    super();    //去掉super中的构造参数 
    offsetAtt = addAttribute(OffsetAttribute.class);
    termAtt = addAttribute(CharTermAttribute.class);
    typeAtt = addAttribute(TypeAttribute.class);
    _IKImplement = new IKSegmenter(input, useSmart);
  }

  @Override
  public boolean incrementToken() throws IOException {
    clearAttributes();
    Lexeme nextLexeme = _IKImplement.next();
    if (nextLexeme != null) {
      termAtt.append(nextLexeme.getLexemeText());
      
      termAtt.setLength(nextLexeme.getLength());
      
      offsetAtt.setOffset(nextLexeme.getBeginPosition(), nextLexeme.getEndPosition());
      
      endPosition = nextLexeme.getEndPosition();
     
      typeAtt.setType(nextLexeme.getLexemeTypeString());
      
      return true;
    }
   
    return false;
  }

  
  @Override
  public void reset() throws IOException {
    super.reset();
    _IKImplement.reset(input);
  }

  @Override
  public final void end() {
    // set final offset
    int finalOffset = correctOffset(this.endPosition);
    offsetAtt.setOffset(finalOffset, finalOffset);
  }
}

找到原始jar包所在位置，将编译好的class文件替换原始的jar包中相同的class文件

将solrhome下配置文件managed-schema 添加一个字段类型使用ik分词器

<fieldType name="text_ik" class="solr.TextField" >  
      <analyzer type="index" isMaxWordLength="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>     
      <analyzer type="query" isMaxWordLength="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>   
    </fieldType>

不能修改 StrField 不支持自定义分词器

然后将对应需要进行中文分词的字段使用 text_ik该字段类型比如

<dynamicField name="*_ik"  type="text_ik"  indexed="true"  stored="true" />

重启或者 cloud环境下重新生成collection 插入数据即可实现中文分词通过某些中文关键字搜索

xubin1623875795

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
solr应用

一 .安装和配置solr1.下载solr安装包 solr的所有的版本 http://archive.apache.org/dist/lucene/solr/ 这里下载的是solr-5.5.4.taz.2. 解压solr-5.5.4.taz 解压后将solr-5.5.4\server\solr-webapp下的webapp目录拷贝到tomcat\webapps目录下并webapp目录改名为so
复制链接

扫一扫

专栏目录