Solr快速入门

最新推荐文章于 2022-07-16 15:35:10 发布

Thor_Selen_Liu

最新推荐文章于 2022-07-16 15:35:10 发布

阅读量716

点赞数

分类专栏：大数据分布式文章标签：搜索引擎爬虫

本文链接：https://blog.csdn.net/Thor_Selen_Liu/article/details/81035362

版权

大数据同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

分布式

1 篇文章 0 订阅

订阅专栏

Solr快速入门：仅仅是在windows系统中的安装使用

前言：

大多数搜索引擎应用都必须具备某种搜索功能，问题是搜索功能往往是巨大的资源消耗并且由于它们沉重的数据库加载而拖垮你的应用的性能。解决这种问题迫在眉睫，急需转移负载到一个外部的搜索服务器，以减轻搜索带来的压力，于是Solr诞生了。

Apache Solr是一个流行的开源搜索服务器,它通过使用类似REST的HTTP API，确保你你能从几乎任何编程语言来使用Solr。

Solr是一个开源搜索平台，用于构建搜索应用程序。它构建在Lucene(全文搜索引擎)之上。Solr是企业级的，快速的和高速可扩展的。使用Solr构建的应用程序非常复杂，可提供高性能。

Solr可以和Hadoop一起使用，由于Hadoop处理大量数据，Solr可以帮助我们从中快速找到所需要的信息。Solr不仅限于搜索，也可以用于存储目的，像其他NoSQL数据库一样，是一种非关系数据存储和处理技术。

总之，Solr是一个高可靠，可扩展，可容错，可部署，搜索/存储引擎；提供分布式索引、复制、负载均衡查询、自动故障转移和恢复，集中配置；优化搜索大量以文本为中心的数据。

1.Solr简介：

1)Solr是一个独立的企业级搜索应用服务器，提供了类似于Web-service的API接口。用户可以通过http请求，向搜索引擎服务器提交一定格式的数据内容，生成索引；也可以通过http操作查询索引，并得到一定格式的返回结果。

2)基于Lucene，完全兼容Lucene创建的索引库

3)两大核心：创建索引和查询索引

2.Solr安装：

软件安装的通用步骤：下载安装包、解压安装包、修改文件配置、启动服务

2.1Solr快速安装

2.1.1下载安装包

solr要部署到tomcat上运行，所以不仅要有solr的安装包.zip，也需要tomcat的安装包

2.1.2解压安装包

新建一个工作空间(目录尽量无空格无汉子)

E:\itcast\env

将solr-4.10.2.zip文件解压到E:\itcast\env

进入解压后的目录

2.1.3修改配置文件

此页博客为快速安装学习操作，所以不需要任何的配置文件修改，使用默认即可。

2.1.4启动服务

1)在解压后的solr文件夹中有一个example文件夹，进入后有一个start.jar文件

使用cmd 进入到这个目录中

2)使用java命令启动服务：java -jar start.jar

3)访问 http://localhost:8983/solr/#/

2.2使用Tomcat部署Solr

2.2.1解压Tomcat到工作区间

重命名之后，叫做：E:\itcast\env\tomcat4solr

2.2.2拷贝solr.war包

将工作区间solr的安装目录下example/webapps下面，找到solr.war包，然后拷贝到tomcat的webapps目录下

2.2.3启动Tomcat

此处启动Tomcat，目的是解压拷贝到tomcat的webapps目录下的solr.war包，启动之后会在此目录下自动生成一个solr文件夹。生成solr文件夹之后，此时启动Tomcat的目的就达到了，关闭Tomcat即可。

2.2.4拷贝jar文件

将工作区间solr的安装目录下的example/lib/ext目录下的所有jar文件，拷贝到tomcat的webapps/solr/WEB-INF/lib下面。

如果要问为什么要拷贝到tomcat的 webapps/solr/WEB-INF/lib下面，因为solr.war包解压后是没有依赖的jar文件的，而solr要正常启动时必须有这些jar文件，更何况tomcat要启动solr呢！其实我们解压solr安装包，并非要单独使用solr,更多的是为了拷贝一些文件到tomcat解压的solr.war的solr中，一供tomcat正常启动solr。

2.2.5拷贝log4j.properties文件

在tomcat的webapps/solr/WEB-INF下面创建classes目录

创建之后目录为：E:\itcast\env\tomcat4solr\webapps\solr\WEB-INF\classes

将工作区间solr的安装目录下的example/resources目录下的log4j.properties进行拷贝，到tomcat的webapps/solr/WEB-INF/classes目录下

2.2.6创建solr_home目录并准备相关配置文件(solr_home是索引库的主目录)

我们这边的创建的后的solr_home文件夹，与tomcat 和 solr 的解压目录是同一级的

创建后的目录为：E:\itcast\env\solr_home

1)拷贝solr的配置文件(3个)到solr_home

将工作区间solr的安装目录下的example/solr目录下的collection1、solr.xml、zoo.cfg ，拷贝到solr_home目录下

其中collection1就是solrCore，就是一个索引库；在solr_home下可以存在多个solrCore。

2)在solr_home下创建一个lib目录，然后将solr安装目录下面的contrib和dist目录拷贝到solr_home/lib下

contrib 和 dist 这两个目录的作用：主要是配置了solrCore扩展jar包。

拷贝完后，打开你的solr_home目录下面的solrCore即：conllection1中的conf 找到solrconfig.xml进行修改

原来：

修改成：

上面这两个文件夹如果不拷贝到solr_home/lib目录下，也不会报错，只是会在solr的UI页面中的logging签中出现大量警告信息。

3)修改项目的web.xml

修改tomcat的webapps/solr/WEB-INF/下面的web.xml

小技巧：notepad++的列编辑模式

光标放到需要编辑的问题
按照Alt按钮
鼠标左键往下拉

2.2.7启动Tomcat，访问： http://localhost:8983/solr

2.2.8 Jetty 和 Tomcat 对比

访问路径：https://blog.csdn.net/xyw591238/article/details/51802616

3. Solr UI 界面

访问路径： http://localhost:8983/solr/#/

3.1主界面

介绍服务器启动的时间、运行的环境（JVM环境和系统的环境）

3.2管理core (索引库)

3.2.1什么是core

Core就是索引库，一个Core就是一个索引库，solr中有一个默认core，就是conllection1

在solr中每个core就是一个文件夹，文件夹下会包含数据目录和配置文件等信息。

3.2.2 为什么要多个core

在实际的业务场景中，我们有很多不同领域的搜索需求，比如商品搜索、评论搜索、图片搜索、新闻搜索等。如果所有需求放在一个索引库中，不仅结果杂乱不堪，查询的时候也需要更多的开销。

为了解决这个问题，我们想要将特定领域的数据放到一起，也就有了多个core

3.2.3 如何添加core

1)复制默认的conllection1

2)将副本结尾的名字修改为自己项目的名字,比如product

3)进入Core的目录，删掉三个文件。因为被删除的这三个文件会自动生成。

E:\itcast\env\solr-4.10.2\example\solr\product

4)回到界面，找到CoreAdmin，添加Core

3.3创建索引

1)选择documents

2)输入内容，支持很多种格式

3)提交文档：其实就是发送一个http请求

3.4查询索引

其实发送了一个Http请求

http://localhost:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true

3.5其他查询参数

参数	描述	示例
q	Solr 中用来搜索的查询。有关该语法的完整描述，请参阅参考资料。可以通过追加一个分号和已索引且未进行断词的字段（下面会进行解释）的名称来包含排序信息。默认的排序是 score desc，指按记分降序排序。	q=myField:Java AND otherField:developerWorks; date asc此查询搜索指定的两个字段，并根据一个日期字段对结果进行排序。
start	将初始偏移量指定到结果集中。可用于对结果进行分页。默认值为 0。	start=15 返回从第 15 个结果开始的结果。
rows	返回文档的最大数目。默认值为 10。	rows=25，返回25个结果集
fq	提供一个可选的筛选器查询。查询结果被限制为仅搜索筛选器查询返回的结果。筛选过的查询由 Solr 进行缓存。它们对提高复杂查询的速度非常有用。	任何可以用 q 参数传递的有效查询，排序信息除外。
hl	当 hl=true 时，在查询响应中醒目显示片段。默认为 false。	hl=true
fl	作为逗号分隔的列表指定文档结果中应返回的 Field 集。默认为 “*”，指所有的字段。“score” 指还应返回记分。	*,score
sort	排序，对查询结果进行排序	sort=date asc,price desc

4. Solr Java API

4.1maven依赖

<dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr-solrj</artifactId>
    <version>4.10.2</version>
</dependency>
<dependency>
    <groupId>commons-logging</groupId>
    <artifactId>commons-logging-api</artifactId>
    <version>1.1</version>
</dependency>

4.2 创建索引

通过SolrInputDocument创建索引

public void testCreateIndex() throws Exception {
    // 1.指定请求的URL
    String baseURL = "http://localhost:8983/solr/collection1";
    // 2.创建HttpSolr服务
    HttpSolrServer httpSolrServer = new HttpSolrServer(baseURL);
    // 3.创建document
    SolrInputDocument solrInputDocument = new SolrInputDocument();
    solrInputDocument.addField("id", 2);
    solrInputDocument.addField("title", "学习solrj客户端的使用");
    solrInputDocument.addField("content", "solrj客户端的增删改查");
    // 4.将document提交给httpserver
    httpSolrServer.add(solrInputDocument);
    // 5.提交
    httpSolrServer.commit();
    // 6.停止服务
    httpSolrServer.shutdown();
}

在lucene中我们创建文档的时候，需要制定字段类型，是否分词，是否存储。在solr的api中，我们并不需要指定，是因为我们提前在配置文件中指定了。

4.3 查询索引

public void testQueryIndex() throws Exception {
    // 1.指定请求的URL
    String baseURL = "http://localhost:8983/solr/collection1";
    // 2.创建HttpSolr服务
    HttpSolrServer httpSolrServer = new HttpSolrServer(baseURL);
    // 3.创建查询对象
    // 参数是solr表达式，格式为field：value。field为你想要匹配的字段，value是相应的词条。
    SolrQuery solrQuery = new SolrQuery("title:我");
    // 4.发起请求，得到响应
    QueryResponse response = httpSolrServer.query(solrQuery);
    // 5.打印响应结果
    SolrDocumentList results = response.getResults();
    for (SolrDocument solrDocument : results) {
        System.out.println(solrDocument.get("id"));
        System.out.println(solrDocument.get("title"));
        System.out.println(solrDocument.get("content"));
    }
    // 6.停止服务
    httpSolrServer.shutdown();
}

4.4 使用Java对象操作索引

为了方便开发，我们可以封装一个Java对象加快我们的开发。

import org.apache.solr.client.solrj.beans.Field;

public class News {

	@Field
	private String id;
	@Field
	private String[] title;
	@Field
	private String[] content;

	public News() {
		super();
	}

	public String getId() {
		return id;
	}

	public void setId(String id) {
		this.id = id;
	}

	public String[] getTitle() {
		return title;
	}

	public void setTitle(String[] title) {
		this.title = title;
	}

	public String[] getContent() {
		return content;
	}

	public void setContent(String[] content) {
		this.content = content;
	}

	@Override
	public String toString() {
		return "News [id=" + id + ", title=" + Arrays.toString(title) + ", content=" + Arrays.toString(content) + "]";
	}
}

问题1：如果创建的是一个普通的java对象，在创建索引的时候，会报错。

原因：是solr客户端并不知道普通的javabean中哪些字段需要创建索引。

解决：加上solr的Field注解

import org.apache.solr.client.solrj.beans.Field;
	.....
	@Field
	private String id;
	@Field
	private String title;
	@Field
	private String content;
	@Field
	private String author;

问题2:在查询之后，赋值错误。

原因：返回的结果集是一个数组，但是javabean中对应的字段是一个String。将数组赋值给string就会出错。

本质上是配置文件中对字段进行设置 multiValued="true"

<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>

解决：修改javebena中字段的类型为数组。

@Field
private String[] title;
@Field
private String[] content;
@Field
private String[] author;

4.4.1 创建索引

public void testCreateIndexWithBean() throws Exception {
    // 1.指定请求的URL
    String baseURL = "http://localhost:8983/solr/collection1";
    // 2.创建HttpSolr服务
    HttpSolrServer httpSolrServer = new HttpSolrServer(baseURL);
    // 3.创建对象
    News news = new News("3", "我要使用javabean提交索引", "是的，方便我们快速开发");
    // 4.将Java对象提交给httpserver
    httpSolrServer.addBean(news);
    // 5.提交
    httpSolrServer.commit();
    // 6.停止服务
    httpSolrServer.shutdown();
}

核心代码：

// 4.将Java对象提交给httpserver
httpSolrServer.addBean(news);

4.4.2 查询索引

public void testQueryIndexWithBean() throws Exception {
    // 1.指定请求的URL
    String baseURL = "http://localhost:8983/solr/collection1";
    // 2.创建HttpSolr服务
    HttpSolrServer httpSolrServer = new HttpSolrServer(baseURL);
    // 3.创建查询对象
    // 参数是solr表达式，格式为field：value。field为你想要匹配的字段，value是相应的词条。
    SolrQuery solrQuery = new SolrQuery("*:*");
    // 4.发起请求，得到响应
    QueryResponse response = httpSolrServer.query(solrQuery);
    // 5.打印响应结果
    List<News> beans = response.getBeans(News.class);
    for (News news : beans) {
        System.out.println(news);
    }
    // 6.停止服务
    httpSolrServer.shutdown();
}

核心代码：

// 5.打印响应结果
List<News> beans = response.getBeans(News.class);
for (News news : beans) {
    System.out.println(news);
}

4.4.3 删除索引

public void testDeleteIndex() throws Exception {
    // 1.指定请求的URL
    String baseURL = "http://localhost:8983/solr/collection1";
    // 2.创建HttpSolr服务
    HttpSolrServer httpSolrServer = new HttpSolrServer(baseURL);
    // 3.通过文章编号删除索引
    UpdateResponse res = httpSolrServer.deleteById("2");
    // 4.提交删除请求
    httpSolrServer.commit();
    // 5.停止服务
    httpSolrServer.shutdown();
}

核心代码：

// 3.通过文章编号删除索引
UpdateResponse res = httpSolrServer.deleteById("2");

4.4.4 根据查询删除

@Test
public void testDeleteByQuery() throws Exception {
    // 1.指定请求的URL
    String baseURL = "http://localhost:8983/solr/collection1";
    // 2.创建HttpSolr服务
    HttpSolrServer httpSolrServer = new HttpSolrServer(baseURL);
    // 3.删除数据
    httpSolrServer.deleteByQuery("*:*");
    // 4.提交
    httpSolrServer.commit();
    // 5.停止服务
    httpSolrServer.shutdown();
}

核心代码：

// 3.删除数据
httpSolrServer.deleteByQuery("*:*");

4.4.5 更新索引

更新数据的本质，就是在编号不变的情况下，重新创建一次索引

 <uniqueKey>id</uniqueKey>

4.4.6 高亮展示

高亮的本质是对已经查询出来的结果进行再次分词，在关键词上加上html标签

public void highlighter() throws Exception {
    // 1.指定请求的URL
    String baseURL = "http://localhost:8983/solr/collection1";
    // 2.创建HttpSolr服务
    HttpSolrServer httpSolrServer = new HttpSolrServer(baseURL);
    // 3. 设置查询条件
    SolrQuery solrQuery = new SolrQuery("我");

    // 开启高亮功能
    solrQuery.setHighlight(true);
    // 设置摘要长度
    solrQuery.setHighlightFragsize(100);
    // 设置高亮
    solrQuery.setHighlightSimplePre("<font color='red'>");
    solrQuery.setHighlightSimplePost("</font>");
    // 给哪些字段添加高亮
    solrQuery.addHighlightField("title");
    solrQuery.addHighlightField("content");

    // 4.查询
    QueryResponse response = httpSolrServer.query(solrQuery);
    // 5.获取高亮内容
    Map<String, Map<String, List<String>>> map = response.getHighlighting();
    System.out.println(map);
}

5. 配置文件介绍

5.1 solr.xml

管理所有的core的配置和solr服务整体配置

5.2 solrconfig.xml

每个core独有一个独立的solrconfig文件。

大多数情况下solrconfig.xml不需要我们修改。

这个配置文件主要配置solr服务器（web项目）。

依赖的lucene版本配置，这决定了你创建的Lucene索引结构，因为Lucene各版本之间的索引结构并不是完全兼容的，这个需要引起你的注意。
索引创建相关的配置，如索引目录，directoryFactory、IndexWriterConfig类中的相关配置(它决定了你的索引创建性能)
solrconfig.xml中依赖的外部jar包加载路径配置
JMX相关配置
缓存相关配置，缓存包括过滤器缓存，查询结果集缓存，Document缓存，以及自定义缓存等等
updateHandler配置即索引更新操作相关配置
RequestHandler相关配置，即接收客户端HTTP请求的处理类配置
查询组件配置如HightLight，SpellChecker等等
ResponseWriter配置即响应数据转换器相关配置，决定了响应数据是以什么样格式返回给客户端的。
自定义ValueSourceParser配置，用来干预Document的权重、评分，排序。
directoryFactory配置详解

5.3 schema.xml

每一个core都有一个独立schema.xml文件。

大多数情况下schema.xml文件不需要我们修改

这个配置文件主要配置document的，以及字段的类型的。

配置索引字段的名称、字段、类型、是否创建索引、动态字段、唯一主键

<field name="keywords" type="text_general" indexed="true" stored="true"/>
<field name="category" type="text_general" indexed="true" stored="true"/>
<field name="resourcename" type="text_general" indexed="true" stored="true"/>
<field name="url" type="text_general" indexed="true" stored="true"/>

字段类型的配置，text_general会指定一个字段是否需要分词。

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
            <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

5.4 自定义schema.xml

这个是练习，用来巩固对schema.xml文件的了解

需求：配置一个专门针对新闻配置文件 id,title,content,author

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
	<!-- id,title,content,author-->
	<!-- 带下划线的都是系统必须要的，需要拷贝过来-->
	<field name="_version_" type="long" indexed="true" stored="true"/>
    <field name="_root_" type="string" indexed="true" stored="false"/>
	
	<!--field-->
	<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
	<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
	<field name="content" type="text_general" indexed="true" stored="true" multiValued="true"/>
	<field name="author" type="string" indexed="true" stored="true"/>
	
	<!-- 一个文档需要一个唯一主键，用来执行更新操作。-->
	<uniqueKey>id</uniqueKey>
	
	<!--fieldType-->
	<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
	 <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
	  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
</schema>

6.集成IK分词器

在lucene如何IK分词器的步骤

1）IK分词的jar在仓库中没有
2）手动的安装IK分词器
3）导入pom依赖
4）new IKAnalyzer()

将IK分词器的jar包拷贝web项目的lib目录下。

目的地址：E:\itcast\env\tomcat4solr\webapps\solr\WEB-INF\lib

在solr中需要在schema.xml中配置。 IK分词器是对字段进行分词。

<schema name="example" version="1.5">
 
   <field name="_version_" type="long" indexed="true" stored="true"/>

   <field name="_root_" type="string" indexed="true" stored="false"/>
 
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="title" type="text_ik" indexed="true" stored="true" multiValued="true"/>
   <field name="author" type="string" indexed="true" stored="true"/>
   <field name="date" type="date" indexed="true" stored="true"/>
   <field name="content" type="text_ik" indexed="false" stored="true" multiValued="true"/>
   
   <field name="text" type="text_ik" indexed="true" stored="false" multiValued="true"/>

	<uniqueKey>id</uniqueKey>

   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="content" dest="text"/>
   
   <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
   <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
   <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>

   <fieldType name="text_ik" class="solr.TextField">
     <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
   </fieldType>

</schema>

7. 集成MySQL数据库

1)将数据库驱动的jar包拷贝web项目的lib目录下。

2)将与dataimport相关的jar拷贝到web项目的lib目录下。

原地址：E:\itcast\env\solr_home\lib\dist

目标地址：E:\itcast\env\tomcat4solr\webapps\solr\WEB-INF\lib

3)在solrconfig.xml中配置handler

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
      <lst name="defaults">
         <str name="config">data-config.xml</str>
      </lst>
  </requestHandler>

4)data-config.xml 放在conf目录下和solrconfig.xml平级

<dataConfig>
    <dataSource type="JdbcDataSource"
        driver="com.mysql.jdbc.Driver"
        url="jdbc:mysql://192.168.140.130:3306/article?characterEncoding=UTF-8"
        user="root"
        password="root"/>
    <document>
        <entity name="article" query="select id,title,content,author,date from article">
            <field column = "id" name="id"/>        
            <field column = "title" name = "title" />
            <field column = "content" name="content" />
	    <field column = "author" name="author" />
            <field column = "date" name="date" />
        </entity>
    </document>
</dataConfig>

访问路径： http://localhost:8080/solr/article/import?command=full-import>

其它命令