使用Solr进行检索

最新推荐文章于 2024-09-09 09:47:53 发布

weixin_34174132

最新推荐文章于 2024-09-09 09:47:53 发布

阅读量86

点赞数

文章标签：数据库 web.xml 人工智能

原文链接：https://my.oschina.net/geek4j/blog/261269

版权

为什么80%的码农都做不了架构师？>>>

之前使用Solr进行简单的查询，没有进行系统的使用过，这次有机会从Solr的搭建（配置文件的书写）到检索的入口，下面简单介绍过程：

一，部署Solr：

1，解压solr-4.7.1.zip

2，把solr-4.7.1\solr-4.7.1\dist\solr-4.7.1.war(将solr-4.7.1.war重命名为solr.war)复制到tomcat\webapp目录下

3，把solr-4.7.1\example\lib\ext中的jar包copy到tomcat的lib中

4，把solr-4.7.1\solr-4.7.1\example\resources中的log4j.properties拷贝到tomcat的lib中

5，新建solr home（D:\SolrHome）

6，把solr-4.7.1\solr-4.7.1\example\solr下的文件全部拷贝到SolrHome（D:\SolrHome）下

7，修改tomcat\webapps\solr\WEB-INF\web.xml

在web.xml文件中添加

<env-entry>
  <env-entry-name>solr/home</env-entry-name>
  <env-entry-value>D:\SolrHome</env-entry-value>
  <env-entry-type>java.lang.String</env-entry-type>
</env-entry>

8，重启tomcat访问http://127.0.0.1:8080/solr

二，添加Solr的Core（采用DIH的方法进行数据的索引）：

1，用solr-4.7.1\example\example-DIH\solr\db\conf文件夹覆盖SolrHome\collection1\conf

2，在Solr_Home\B_NEWS_EDIT\conf\solrconfig.xml中通过<lib dir="../../lib/" />节点导入所需的jar包（不同数据库对应不同的jar包）

3，书写索引文件db-data-config.xml：

<dataConfig>
    <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:./example-DIH/hsqldb/ex" user="sa" password="sa" />
    <document>
        <entity name="item" query="select * from item"
                deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'">
            <field column="NAME" name="name" />

            <entity name="feature"  
                    query="select DESCRIPTION from FEATURE where ITEM_ID='${item.ID}'"
                    deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'"
                    parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}">
                <field name="features" column="DESCRIPTION" />
            </entity>
            
            <entity name="item_category"
                    query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"
                    deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'"
                    parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}">
                <entity name="category"
                        query="select DESCRIPTION from category where ID = '${item_category.CATEGORY_ID}'"
                        deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'"
                        parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}">
                    <field column="DESCRIPTION" name="cat" />
                </entity>
            </entity>
        </entity>
    </document>
</dataConfig>

其中<dataSource />节点指定数据库的驱动，用户名和密码，<entity />几点指定索引数据的SQL，如果是全量导入只需写query，如果是增量导入则需要把deltaQuery加上

自动全导入和自动增量导入

此功能可以自己写程序实现，也可利用apache-solr-dataimportscheduler-1.0.jar包完成此功能。配置如下：

修改solr.war中WEB-INF/web.xml, 在servlet节点前面增加:

<listener>
    <listener-class>
        org.apache.solr.handler.dataimport.scheduler.ApplicationListener
    </listener-class>
</listener>

将apache-solr-dataimportscheduler-.jar 中 dataimport.properties 取出并根据实际情况修改,然后放到 solr.home/conf (不是solr.home/core/conf) 目录下面

具体配置可参考：http://code.google.com/p/solr-dataimport-scheduler/

4，在schema.xml中定义<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> ，其中name与db-data-config.xml中SQL读取的字段名一致，type标识该字段的类型， indexed标识是否索引，stored：是否保存，required：是否必须（如果为true当SQL中没有对应的字段是将出现异常），multiValued是否多值。

处理CLOB字段（ClOB 字段的 column 必须大写！！）

<entity name="meta" query="select id,filename,content,bytes from documents" transformer="ClobTransformer">            
    <field column="ID" name="id" />
    <field column="CONTENT" name="CONTENT" clob="true" />
</entity>

此时，可以通过访问http://127.0.0.1:8080/solr，进入对应的core，并进行数据索引了。

三，SolrJ的使用：

1，在项目中导入所需的jar包，如果是MAVEN则直接添加对应的依赖

2，在项目中直接建立SolrCore所对应的实体，通过@Field("Field_Name")注解来实现（Field_Name为对应SolrCore中的schema.xml中对应的Field的name）。

3，建立所需要的工具类：

SolrConnection.java：用于获取Solr的连接，并进行一些参数的设置

public class SolrConnection {
	
	private static SolrConnection solrConnection;

	private static HttpSolrServer httpSolrServer;
	
    public static synchronized SolrConnection getInstance() {
        if (solrConnection == null){
        	solrConnection = new SolrConnection();
        }
        return solrConnection;
    }
    
    public HttpSolrServer getServer(String core){
		if(httpSolrServer == null) {
			httpSolrServer = new HttpSolrServer(Constants.SYSTEM_SEARCH_SOLR_URL + core + "/");
			httpSolrServer.setSoTimeout(60000); 
			httpSolrServer.setConnectionTimeout(1000);
			httpSolrServer.setDefaultMaxConnectionsPerHost(100);
			httpSolrServer.setMaxTotalConnections(100);
			httpSolrServer.setFollowRedirects(false); 
			httpSolrServer.setAllowCompression(true);
			httpSolrServer.setMaxRetries(1);
        }else{
        	httpSolrServer.setBaseURL(Constants.SYSTEM_SEARCH_SOLR_URL + core + "/");
        }
        return httpSolrServer;
    }
    
}

SolrEngine.java：其中包括SolrJ的增删改（改直接通过增来实现）

public class SolrEngine {
	
	public static void save(SolrMapper solrMapper) throws SolrServerException, IOException {
		SolrDocument doc = solrMapper.build();
		HttpSolrServer server = SolrConnection.getInstance().getServer(doc.getCore());
		server.add(new DocumentObjectBinder().toSolrInputDocument(doc));
		server.commit();
	}
	
	public static void delete(SolrMapper solrMapper) throws SolrServerException, IOException {
		SolrDocument doc = solrMapper.build();
		HttpSolrServer server = SolrConnection.getInstance().getServer(doc.getCore());
		server.deleteById(String.valueOf(doc.getId()));
		server.commit();
	}
	
	public static <E> List<E> query(SolrQuery sQuery, String core, Class<E> clazz) throws SolrServerException {
		HttpSolrServer server = SolrConnection.getInstance().getServer(core);
		QueryResponse response = server.query(sQuery);
		SolrDocumentList list = response.getResults();
		return new DocumentObjectBinder().getBeans(clazz, list);
	}
	
}

其中SolrMapper为自己定义的抽象类，其中build方法为抽象方法，SolrMapper存在的意义在于，当我们需要通过其他持久层框架读取的实体数据来插入到Solr中，而持久层框架读取的实体数据和Solr所对应的不是同一实体，则需要通过SolrMapper中的build方法，将持久层的实体属性复制到Solr的实体属性中，再进行插入。

转载于:https://my.oschina.net/geek4j/blog/261269