Solr全文检索服务器搭建与基本使用介绍

最新推荐文章于 2024-07-04 14:41:08 发布

__好好学习

最新推荐文章于 2024-07-04 14:41:08 发布

阅读量5.6k

点赞数 2

分类专栏： Hibernate solr 文章标签： solr 全文检索服务器

本文链接：https://blog.csdn.net/u012476983/article/details/53992158

版权

Hibernate 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

solr

1 篇文章 0 订阅

订阅专栏

solr服务器搭建

这里我们直接开始搭建服务器，如果需要了解solr，请访问solr官网,笔者就不在这里说明了。
- solr服务器下载

wget http://apache.fayea.com/lucene/solr/6.3.0/solr-6.3.0.tgz

或者访问solr下载

下载完成后，解压到工作目录

tar -zxvf solr-6.3.0.tgz
#移动到我的工作目录下
mv solr-6.3.0 /Users/yangyang/Workspaces/
cd /Users/yangyang/Workspaces/solr-6.3.0/bin
#启动solr 默认端口8983
./solr start

启动成功如下图
屏幕快照 2016-12-03 15.24.56.png
我们直接访问一下，显示界面表示我们已经成功运行了solr服务器
屏幕快照 2016-12-03 15.26.22.png

schema配置

我们完成了上面solr服务器搭建和运行，接下来，我们开始尝试配置我们的solr服务器吧，这里，我们先进行schema配置，也是最重要的地方，配置错误会导致solr服务器不能正常工作，现在就开始吧
- 我们先创建一个core

#进入solr所在目录
cd /Users/yangyang/Workspaces/solr-6.3.0/
cd server/solr/
#创建core目录
mkdir -pv blog/conf
#复制基础配置文件
cp -r configsets/basic_configs/conf/*  blog/conf/

接下来进入solr管理界面，按下图操作，然后点击Add Core

下面截图表示core创建成功

进行下面步骤之前，我们还需要了解一些managed-schema配置文件，这里就不介绍了，可以查看配置文件 managed-schema (schema.xml)(1)、配置文件 managed-schema (schema.xml)(2)、配置文件 managed-schema (schema.xml)(3)、配置文件 managed-schema (schema.xml)(4)
- 加入smartcn中文分词(lucene-analyzers-smartcn-6.3.0.jar)(备注：也可以选择其他分词jar，笔者暂时使用的smartcn)，把下载的分词jar拷贝到${user.solr.path}/server/solr-webapp/WEB-INF/lib中。
- 修改managed-schema中配置，使用我们加入的中文分词
在managed-schema(创建的core的conf文件夹下)搜索<fieldType name="text_general"，
替换

<!--原有的分词方式-->
<!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
<!--使用smartcn-->
<tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>

注意：请把index、query两个类型的tokenizer都替换，不然会导致全文检索时出错，不能按预先的index设置进行检索。
- 设置field 与 copyField，这里我使用了当前博客系统全文检索配置文件进行说明


    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<!--这个字段是必须的-->
    <field name="_version_" type="long" indexed="true" stored="false"/>
    <field name="title" type="text_general" indexed="true" stored="true"/>
    <field name="content" type="text_general" indexed="true" stored="true"/>
    <field name="sketch" type="text_general" indexed="true" stored="true"/>
    <field name="columnNamesCache" type="text_general" indexed="true" stored="true"/>
    <field name="labelNamesCache" type="text_general" indexed="true" stored="true"/>

    <field name="columnIdsCache" type="string" stored="true"/>
    <field name="labelIdsCache" type="string" stored="true"/>
    <field name="main_photo" type="string" stored="true"/>
    <field name="author_head_img" type="string" stored="true"/>
    <field name="author_nickname" type="string" stored="true"/>
    <field name="author_id" type="string" stored="true"/>
    <field name="statistics" type="long" stored="true"/>
    <field name="status" type="boolean" stored="true"/>
    <field name="creatTime" type="long" stored="true"/>

<!--检索时查询这个字段-->
    <field name="text" type="text_general" multiValued="true" indexed="true" stored="true"/>
<!--唯一标示-->
    <uniqueKey>id</uniqueKey>
<!--copy 到text中，用作搜索使用-->
    <copyField source="title" dest="text"/>
    <copyField source="content" dest="text"/>
    <copyField source="sketch" dest="text"/>
    <copyField source="columnNamesCache" dest="text"/>
    <copyField source="labelNamesCache" dest="text"/>

上面展示了一些简单的配置，这里的配置也是solr中比较重要的地方，具体配置需要依据功能而定，请读者自行测试。
接下来我们重启solr服务（默认已进入{Solr.path}/bin）

./solr stop -all
./solr start

确定成功后，我们可以进入solr管理页面查看core是否配置正确。正确情况下，选择我们的core，进入Schema管理可以看到新添加的配置信息，如下图

配置信息也可以直接在管理页面中添加，笔者个人喜欢直接修改manger-schema配置文件。到此，schema配置就结束了，solr更多基础示例可以直接访问Apache Solr

solrj与spring整合

solrj的作用是方便我们在Java服务端快捷调用solr API，具体介绍可以查看solrj wiki

我们在Spring 配置中添加如下配置文件

<!--定义solr的server-->
    <bean id="httpSolrServerBlog" class="org.apache.solr.client.solrj.impl.HttpSolrClient">
        <constructor-arg index="0" value="${solr.Url}"/>
        <!-- 设置响应解析器 -->
        <property name="parser">
            <bean class="org.apache.solr.client.solrj.impl.XMLResponseParser"/>
        </property>
        <!-- 建立连接的最长时间 -->
        <property name="connectionTimeout" value="${solr.connectionTimeout}"/>
    </bean>

在配置信息中添加

solr.Url=http://localhost:8983/solr/blog
solr.connectionTimeout=500

在使用环境中，只需要注入即可.

    private final SolrClient solrClient;

    @Autowired
    public SolrUtilImpl(SolrClient solrClient) {
        this.solrClient = solrClient;
    }

如果使用了多个solr core，只需要定义多个been，使用不同ID，需要那个been就注入那个been。

solrj简单使用

这里只介绍solrj的一些简单使用，详情请点击solrj wiki查看
前面我们已经成功启动了solr，现在我们通过solrj访问solr服务器，进行文档的CURD操作。
备注：还可以通过数据库方式进行文档同步

基础Java类或公共方法

文档对象

package org.blog.entity;
//省略import 信息
public class BlogArticle{
    private Long id;
    private String name; //标题
    private String mainPhoto; //封面图片
    private String sketch; //简述
    private String content; //详细描述
    private String contentMd; //详细描述 markdown
    private Boolean ifTop; //是否置顶
    private VUser user; //本文发布者
    private String sources; //来源
    private String staticCode; //静态码
    private BigDecimal sorter;
    private Boolean status; //状态
    private String creater;
    private Timestamp lastUpdateTime;
    private Timestamp creatTime;
    private String columnNamesCache;
    private String columnIdsCache;
    private String labelIdsCache;
    private String labelNamesCache;
    //省略set get 方法 VUser为用户对象 里面保存用户头像 昵称 等信息
}

获取SolrInputDocument对象的公共方法
SolrInputDocument是solrj提供的与solr服务器进行文档操作的对象模型

    /**
     * 获取solr全文检索对象
     *
     * @param tempVo 文档
     * @return 全文检索对象
     */
    public SolrInputDocument getSolrInputDocument(BlogArticle tempVo) {
        //添加solr
        SolrInputDocument document = new SolrInputDocument();
        document.addField("id", tempVo.getId());
        document.addField("title", tempVo.getName());
        document.addField("content", HtmlUtil.Html2Text(tempVo.getContent())); //方便检索，取消html标签等字符
        document.addField("sketch", tempVo.getSketch());
        document.addField("columnNamesCache", tempVo.getColumnNamesCache());
        document.addField("columnIdsCache", tempVo.getColumnIdsCache());
        document.addField("labelIdsCache", tempVo.getColumnIdsCache());
        document.addField("labelNamesCache", tempVo.getLabelNamesCache());
        document.addField("creatTime", tempVo.getCreatTime().getTime());
        document.addField("main_photo", tempVo.getMainPhoto());
        document.addField("author_head_img", tempVo.getUser().getHeadImg()); //头像
        document.addField("author_nickname", tempVo.getUser().getNickName()); //昵称
        document.addField("author_id", tempVo.getUser().getId()); //id
        document.addField("status", tempVo.getStatus());
        return document;
    }

HtmlUtil工具类
用于去掉富文本编辑框中加入的html标签，检索的时候，不需要包含标签信息

package org.blog.util;

import com.fangshuo.common.log.Logger;

import java.util.regex.Pattern;
/**
 * @author Created by yangyang on 2016/11/18.
 *         e-mail ：yangyang_666@icloud.com ； tel ：18580128658 ；QQ ：296604153
 */
public class HtmlUtil {

    private static final Logger logger = Logger.getLogger(HtmlUtil.class); //自己的日志

    public static String Html2Text(String inputString) {
        String htmlStr = inputString; //含html标签的字符串
        String textStr = "";
        java.util.regex.Pattern p_script;
        java.util.regex.Matcher m_script;
        java.util.regex.Pattern p_style;
        java.util.regex.Matcher m_style;
        java.util.regex.Pattern p_html;
        java.util.regex.Matcher m_html;
        java.util.regex.Pattern p_other;
        java.util.regex.Matcher m_other;
        try {
            String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>"; //定义script的正则表达式{或<script[^>]*?>[\\s\\S]*?<\\/script> }
            String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>"; //定义style的正则表达式{或<style[^>]*?>[\\s\\S]*?<\\/style> }
            String regEx_html = "<[^>]+>"; //定义HTML标签的正则表达式
            String regEx_other = "\\s*\n";

            p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);
            m_script = p_script.matcher(htmlStr);
            htmlStr = m_script.replaceAll(""); //过滤script标签

            p_style = Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);
            m_style = p_style.matcher(htmlStr);
            htmlStr = m_style.replaceAll(""); //过滤style标签

            p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);
            m_html = p_html.matcher(htmlStr);
            htmlStr = m_html.replaceAll(""); //过滤html标签

            p_other = Pattern.compile(regEx_other);
            m_other = p_other.matcher(htmlStr);
            htmlStr = m_other.replaceAll(""); //过去掉其他字符

            textStr = htmlStr;
        } catch (Exception e) {
            logger.error("Html2Text: " + e.getMessage());
        }
        return textStr;//返回文本字符串
    }
}

SolrUtil接口

package org.blog.util;

import com.fangshuo.common.util.ChecksException; //自定义异常
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.common.SolrInputDocument;

import java.io.IOException;
import java.util.Collection;
import java.util.List;

/**
 * @author Created by yangyang on 16/9/20.
 *         e-mail ：yangyang_666@icloud.com ； tel ：18580128658 ；QQ ：296604153
 */
public interface SolrUtil {

    //添加对象到全文检索
    void add(SolrInputDocument doc) throws IOException, SolrServerException, ChecksException;

    void add(Collection<SolrInputDocument> docs) throws IOException, SolrServerException, ChecksException;

    //全文检索中移除对象
    void delete(List<String> ids) throws IOException, SolrServerException, ChecksException;

    void delete(String id) throws IOException, SolrServerException, ChecksException;

    SolrClient getSolrClient();

}

SolrUtilImpl实现类

package org.blog.util.impl;

import com.fangshuo.common.util.ChecksException;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.response.UpdateResponse;
import org.apache.solr.common.SolrInputDocument;
import org.blog.util.SolrUtil;

import java.io.IOException;
import java.util.Collection;
import java.util.List;

/**
 * 这里为了操作多个solr文档库，使用时extends当前类
 * @author Created by yangyang on 2016/12/8.
 *         e-mail ：yangyang_666@icloud.com ； tel ：18580128658 ；QQ ：296604153
 */

public class SolrUtilImpl implements SolrUtil {

    private final SolrClient solrClient;

    public SolrUtilImpl(SolrClient solrClient) { 
        this.solrClient = solrClient;
    }


    @Override
    public void add(SolrInputDocument doc) throws IOException, SolrServerException, ChecksException {
        UpdateResponse response = solrClient.add(doc);
        if (response.getStatus() == 0) {
            solrClient.commit();
        } else {
            throw new ChecksException("solr add error");
        }
    }

    @Override
    public void add(Collection<SolrInputDocument> docs) throws IOException, SolrServerException, ChecksException {
        if (docs != null && docs.size() > 0) {
            UpdateResponse response = solrClient.add(docs);
            if (response.getStatus() == 0) {
                solrClient.commit();
            } else {
                throw new ChecksException("solr add error");
            }
        }
    }

    @Override
    public void delete(List<String> ids) throws IOException, SolrServerException, ChecksException {
        if (ids != null && ids.size() > 0) {
            UpdateResponse response = solrClient.deleteById(ids);
            if (response.getStatus() == 0) {
                solrClient.commit();
            } else {
                throw new ChecksException("solr del error");
            }
        }
    }

    @Override
    public void delete(String id) throws IOException, SolrServerException, ChecksException {
        UpdateResponse response = solrClient.deleteById(id);
        if (response.getStatus() == 0) {
            solrClient.commit();
        } else {
            throw new ChecksException("solr del error");
        }
    }

    @Override
    public SolrClient getSolrClient() {
        return this.solrClient;
    }
}

文档操作实现SolrArticleUtilImpl

package org.blog.util.impl;

import org.apache.solr.client.solrj.SolrClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Component;

/**
 * @author Created by yangyang on 2016/12/23.
 *         e-mail ：yangyang_666@icloud.com ； tel ：18580128658 ；QQ ：296604153
 */
@Component("solrArticleUtilImpl")
public class SolrArticleUtilImpl extends SolrUtilImpl {
    @Autowired
    public SolrArticleUtilImpl(@Qualifier("httpSolrServerBlog") SolrClient solrClient) {
        super(solrClient);
    }
}

文档CURD操作

下面介绍使用solrj管理solr服务器文档，使用的公共类或者工具类都在上面列出，如果直接使用，请注意去掉ChecksException自定义异常类（下同）。

增加、删除、修改

下面是简要使用代码，这里接口就不写出了,具体逻辑请根据实际业务确定。

    /**
    公共操作类，根据传入的参数类型，执行对应的操作
    @ param ifDelete 是否执行删除操作
    @ param obj 操作对象，这里可以是文档对象(BlogArticle) 文档ID(String) 文档ID数组(List<String>)
    */
    @SuppressWarnings("unchecked")
    private void solrOption(boolean ifDelete, Object obj) throws ChecksException {
        if (obj != null) {
            try {
                if (ifDelete) {
                    if (obj instanceof List) {
                        solrArticleUtil.delete((List<String>) obj);
                    } else {
                        solrArticleUtil.delete(obj.toString());
                    }
                } else {
                    solrArticleUtil.add(getSolrInputDocument((BlogArticle) obj));//添加和更新使用相同方法，solr会自动根据id判断文档，如果文档不存在则添加，反之则更新
                }
            } catch (IOException | SolrServerException e) {
                e.printStackTrace();
                throw new ChecksException("solr article option error");
            }
        }
    }

    /**
    * 添加方法
    * @ param tempVo 需要添加的文档对象
    */
    @Override
    @Transactional(rollbackFor = Exception.class)
    public JSONResult fixAdd(BlogArticle tempVo) throws ChecksException{
        //省略添加逻辑
        solrOption(false, tempVo);
        //省略return
    }

    /**
    * 修改方法
    * @ param tempVo 需要添加的文档对象
    */
    @Override
    @Transactional(rollbackFor = Exception.class)
    public JSONResult fixEditSave(BlogArticle tempVo) throws ChecksException{
        //省略修改逻辑
        solrOption(false, tempVo);
        //省略return
    }

    /**
     * 单项删除
     *
     * @param id 数据id
     */
    @Transactional(rollbackFor = Exception.class)
    public JSONResult fixDel(Object id) throws ChecksException {
        //省略删除逻辑
        solrOption(true, id.toString());
        //省略return
    }

    /**
     * 多项删除
     *
     * @param id 数据id
     */
    @Transactional(rollbackFor = Exception.class)
    public JSONResult fixDelMore(List<Object> ids) throws ChecksException {
        //省略删除逻辑
        List<String> list = new ArrayList<>(ids.size());
        ids.forEach(x-> list.add(ids.toString()));
        solrOption(true, list);
        //省略return
    }

以上就是利用solrj维护solr服务器文档库的简要方法，已经能满足基本使用。

查询

查询相对其他三总方式要稍显复杂，下面我们便开始介绍查询方法
先创建一个SolrSearchBeen，用来接收前台传递的数据

package org.blog.been;

//自定义判断类
import com.fangshuo.common.util.Checks; 
import com.fangshuo.common.util.ChecksException;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.util.ClientUtils;
import org.blog.util.BlogContacts;

import java.io.Serializable;
import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;

/**
 * 用于solr全文检索
 *
 * @author Created by yangyang on 2016/12/23.
 *         e-mail ：yangyang_666@icloud.com ； tel ：18580128658 ；QQ ：296604153
 */

public class SolrSearchBeen implements Serializable {

    private String q; //检索条件
    private Integer nowPage = 1; //当前页
    private Integer pageSize = BlogContacts.HOME_LOADING_PAGE_SIZE; //每页大小
    private Boolean ifHl = true; //是否高亮

    /**
    *获取查询对象
    */
    public SolrQuery getQuery() throws ChecksException {
        SolrQuery query = new SolrQuery();
        if (!Checks.empty(q)) { //判断是否为空
            //q前台编码了 这里解码一下
            try {
                q = URLDecoder.decode(q, "UTF-8");
            } catch (UnsupportedEncodingException e) {
                throw new ChecksException(" q decoder error");
            }
            query.set("q", "text:" + ClientUtils.escapeQueryChars(q));
        } else { //如果前台为传入检索条件，这里添加一个默认排序字段
            //设置排序
            query.set("sort", "creatTime desc");
            query.set("q", "*:*");
        }
        //这里还可以添加过滤条件 查询语法可以自行搜索
        //query.set("fq", ...);
        query.set("fq", "status:true");

        if (nowPage < 1) nowPage = 1;
        query.setStart((nowPage - 1) * pageSize);
        query.setRows(pageSize);

        //设置需要展示的字段
        query.set("fl", "id,title,sketch,labelNamesCache,columnIdsCache," +
                "labelIdsCache,creatTime,main_photo," +
                "author_head_img,author_nickname,author_id,content");
        if (ifHl) {
            //高亮
            query.setHighlight(true);
            query.addHighlightField("title");
            query.addHighlightField("sketch");
            query.addHighlightField("content");
            query.setHighlightSimplePre("<font color='red'>");
            query.setHighlightSimplePost("</font>");
        }
        return query;
    }
    //省略set、get方法
}

下面是查询代码，这里就只展示单个接口实现，代码中可能出现一些工具类，这里就不展示工具类代码，大家可以通过代码看出工具类提供的功能

/**
     * solr全文检索
     *
     * @param solrSearchBeen 搜索been
     * @return 查询结果
     * @throws ChecksException error
     */
    @Override
    public JSONResult search(SolrSearchBeen solrSearchBeen) throws ChecksException {
        try {
            QueryResponse rsp = solrArticleUtil.getSolrClient().query(solrSearchBeen.getQuery());
            SolrDocumentList docs = rsp.getResults();
            //获取所有高亮的字段
            Map<String, Map<String, List<String>>> highlightMap = rsp.getHighlighting();
            Iterator<SolrDocument> iter = docs.iterator();
            //封装的分页结果保存对象
            PageResult pageResult = new PageResult();
            pageResult.setNowPage(solrSearchBeen.getNowPage());
            pageResult.setPageSize(solrSearchBeen.getPageSize());
            pageResult.setTotalRecord(Math.toIntExact(docs.getNumFound()));
            pageResult.setTotalPageOpt();
            List<Object> voList = new ArrayList<>();
            Date date = new Date();
            List<Long> ids = new ArrayList<>();
            while (iter.hasNext()) {
                SolrDocument doc = iter.next();
                Map<String, Object> map = new HashMap<>();
                map.put("id", doc.getFieldValue("id"));
                ids.add(Long.valueOf(doc.getFieldValue("id").toString()));
                map.put("name", doc.getFieldValue("title"));
                map.put("sketch", doc.getFieldValue("sketch"));
                map.put("labels", doc.getFieldValue("labelNamesCache"));
                map.put("labelIds", doc.getFieldValue("labelIdsCache"));
                date.setTime(Long.valueOf(doc.getFieldValue("creatTime").toString()));
                map.put("creatTime", TimeMaker.toDateTimeStr(date));
                map.put("column", doc.getFieldValue("columnIdsCache").toString().split(";")[0]);
                map.put("nickname", doc.getFieldValue("author_nickname"));
                map.put("uId", doc.getFieldValue("author_id"));
                map.put("headImg", doc.getFieldValue("author_head_img"));
                map.put("mainPhoto", doc.getFieldValue("main_photo"));
                if (solrSearchBeen.getIfHl()) {
                    String id = doc.getFieldValue("id").toString();
                    List<String> titleList = highlightMap.get(id).get("title");
                    List<String> sketchList = highlightMap.get(id).get("sketch");
                    List<String> contentList = highlightMap.get(id).get("content");
                    //获取并设置高亮的字段
                    if (titleList != null && titleList.size() > 0) {
                        map.put("name", titleList.get(0));
                    }
                    if (sketchList != null && sketchList.size() > 0) {
                        map.put("sketch", sketchList.get(0));
                    }
                    if (contentList != null && contentList.size() > 0) {
                        //统一前端摘要展示
                        map.put("sketch", contentList.get(0));
                    }
                }
                voList.add(map);
            }
            pageResult.setVoList(voList);
            return JsonUtil.getSuccess("success", pageResult);
        } catch (SolrServerException | IOException e) {
            e.printStackTrace();
            throw new ChecksException("solr article search error");
        }
    }