Solr-6.6.4部署以及实现动态分词

相关资源下载

Solr-6.6.4工具包 百度网盘下载(提取码:acsx)

IKAnalyzer2012FF  jar百度网盘下载(提取码:dshv)

IK-Analyzer-2012FF 源码 百度网盘下载(提取码:u9n5)

Solr启动

  1. 解压Solr工具包,进入solr-6.6.4/server/solr文件夹创建测试demo_core文件夹,然后复制configsets/sample_techproducts_configs/conf文件夹到demo_core下。
  2. 进入solr-6.6.4/bin,命令启动:solr start (默认端口8983或指定端口启动solr start -p 8981)
  3. 浏览器访问http://localhost:8983/solr,点击右侧Core Amin -> Add Core,设置属性并保存。(注意:第三步骤的core,要跟你第一步建立的文件夹同一个名字,其余值默认即可。)
  4. 回到demo_core文件夹,发现多出data文件夹和core.properties文件。data目录用来存放索引文件,core.properties内是demo_core的配置信息。

配置IK中文分词器

  1. Solr默认是不带IK中文分词器的,需要自己导入IK-Analyzer jar包并配置。
  2. 把下载的IKAnalyzer2012FF.jar复制到solr-6.6.4/server/solr-webapp/webapp/WEB-INF/lib文件夹下。
  3. 进入solr-6.6.4/server/solr/demo_core/conf,复制一份managed-schema副本作为备份,
  4. 编辑managed-schema文件,在文本最后添加IK中文分词器的fieldType。
    <!-- IK中文分词器 -->
    <fieldType name="text_ik" class="solr.TextField">
    	<!--索引时候的分词器-->
    	<analyzer type="index" isMaxWordLength="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
    	<!--查询时候的分词器-->
    	<analyzer type="query" isMaxWordLength="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
    </fieldType>
  5. 进入solr-6.6.4/bin,命令重启Solr:solr restart -p -8983
  6. 浏览器访问http://localhost:8983/solr,点击右侧Core Selector,选择demo_core -> Analysis,在FieldValue(Index) 中输入“帆布鞋”,在Select an Option中选择text_ik,执行右侧的Analyse Values。

配置动态IK分词器(动态填充)

  1. MAVEN创建IKAnalyzer6.6.4 java项目,把IKAnalyzer-2012FF源码和配置文件复制到新项目中。
  2. 配置pom.xml文件。
    <?xml version="1.0" encoding="UTF-8"?>
    
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.wltea.analyzer</groupId>
        <artifactId>ik-analyzer</artifactId>
        <version>6.6.4</version>
        <packaging>jar</packaging>
        <name>${project.artifactId}</name>
        <!-- FIXME change it to the project's website -->
        <url>http://www.example.com</url>
    
        <properties>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
            <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
            <maven.compiler.source>1.8</maven.compiler.source>
            <maven.compiler.target>1.8</maven.compiler.target>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.lucene</groupId>
                <artifactId>lucene-analyzers-common</artifactId>
                <version>${project.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.lucene</groupId>
                <artifactId>lucene-queryparser</artifactId>
                <version>${project.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.lucene</groupId>
                <artifactId>lucene-memory</artifactId>
                <version>${project.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.lucene</groupId>
                <artifactId>lucene-backward-codecs</artifactId>
                <version>${project.version}</version>
            </dependency>
        </dependencies>
    
        <build>
            <resources>
                <resource>
                    <directory>src/main/java/org/wltea/analyzer/dic/</directory>
                    <filtering>true</filtering>
                    <includes>
                        <include>*.dic</include>
                    </includes>
                    <targetPath>${project.build.directory}/classes/org/wltea/analyzer/dic</targetPath>
                </resource>
                <resource>
                    <directory>src/main/resources</directory>
                    <filtering>true</filtering>
                    <targetPath>${project.build.directory}/classes/</targetPath>
                </resource>
            </resources>
        </build>
    </project>
    
  3. 由于Lecene版本问题,需要修改几处异常。
  4. IKAnalyzerIKTokenizerIKQueryExpressionParser
    SWMCQueryBuilderLuceneIndexAndSearchDemo
  5. 创建动态分词所需的IKTokenizerFactory.java,UpdateKeeper.java。
    package org.wltea.analyzer.lucene;
    
    import org.apache.lucene.analysis.Tokenizer;
    import org.apache.lucene.analysis.util.ResourceLoader;
    import org.apache.lucene.analysis.util.ResourceLoaderAware;
    import org.apache.lucene.analysis.util.TokenizerFactory;
    import org.apache.lucene.util.AttributeFactory;
    import org.wltea.analyzer.dic.Dictionary;
    
    import java.io.IOException;
    import java.io.InputStream;
    import java.util.*;
    import java.util.logging.Logger;
    
    /**
     * 增加IK扩展词库动态更新类
     *
     * @Author: sunshuo
     * @Date: 2019/3/12 15:54
     * @Version: 1.0
     */
    public class IKTokenizerFactory extends TokenizerFactory
    		implements ResourceLoaderAware, UpdateKeeper.UpdateJob {
    
    	private final static Logger LOGGER = Logger.getLogger(IKTokenizerFactory.class.getName());
    
    	private boolean useSmart;
    
    	private ResourceLoader loader;
    
    	private long lastUpdateTime = -1L;
    
    	private String conf;
    
    	/**
    	 * Initialize this factory via a set of key-value pairs.
    	 *
    	 * @param args
    	 */
    	public IKTokenizerFactory(Map<String, String> args) {
    		super(args);
    		this.useSmart = getBoolean(args, "useSmart", false);
    		this.conf = get(args, "conf");
    		System.out.println(String.format(":::ik:construction:::::::::::::::::::::::::: %s", this.conf));
    	}
    
    	@Override
    	public void inform(ResourceLoader loader) throws IOException {
    		System.out.println(String.format(":::ik:::inform:::::::::::::::::::::::: %s", this.conf));
    		this.loader = loader;
    		update();
    		if ((this.conf != null) && (!this.conf.trim().isEmpty())) {
    			UpdateKeeper.getInstance().register(this);
    		}
    	}
    
    	@Override
    	public Tokenizer create(AttributeFactory factory) {
    		return new IKTokenizer(factory, useSmart());
    	}
    
    	/**
    	 * 执行更新词典操作
    	 *
    	 * @throws IOException
    	 */
    	@Override
    	public void update() throws IOException {
    		Properties p = canUpdate();
    		if (p != null) {
    			List<String> dicPaths = splitFileNames(p.getProperty("files"));
    			List<InputStream> inputStreamList = new ArrayList<>();
    			for (String path : dicPaths) {
    				if ((path != null) && (!path.isEmpty())) {
    					InputStream is = this.loader.openResource(path);
    					if (is != null) {
    						inputStreamList.add(is);
    					}
    				}
    			}
    			if (!inputStreamList.isEmpty()) {
    				Dictionary.reloadDic(inputStreamList);
    			}
    		}
    	}
    
    	/**
    	 * 检查是否要更新
    	 *
    	 * @return
    	 */
    	private Properties canUpdate() {
    		if (this.conf == null) {
    			return null;
    		}
    		Properties p;
    		InputStream confStream = null;
    		try {
    			p = new Properties();
    			confStream = this.loader.openResource(this.conf);
    			p.load(confStream);
    		} catch (IOException e) {
    			System.err.println("IK parsing conf NullPointerException~~~~~" + Arrays.toString(e.getStackTrace()));
    			return null;
    		} finally {
    			if (confStream != null) {
    				try {
    					confStream.close();
    				} catch (IOException ignored) {
    				}
    			}
    		}
    		String lastUpdate = p.getProperty("lastUpdate", "0");
    		Long t = new Long(lastUpdate);
    		if (t > this.lastUpdateTime) {
    			this.lastUpdateTime = t;
    			String paths = p.getProperty("files");
    			if ((paths != null) && (!paths.trim().isEmpty())) {
    				System.out.println("loading conf files success.");
    				return p;
    			}
    		}
    		this.lastUpdateTime = t;
    		return null;
    	}
    
    	private boolean useSmart() {
    		return useSmart;
    	}
    }
    
    
    
    
    package org.wltea.analyzer.lucene;
    
    import java.io.IOException;
    import java.util.Vector;
    
    /**
     * 1分钟自动判断更新
     *
     * @Author: sunshuo
     * @Date: 2019/3/12 15:55
     * @Version: 1.0
     */
    public class UpdateKeeper implements Runnable {
    
    	static final long INTERVAL = 60000L;
    
    	private static UpdateKeeper singleton;
    
    	Vector<UpdateJob> filterFactorys;
    
    	Thread worker;
    
    	private UpdateKeeper() {
    		this.filterFactorys = new Vector<UpdateJob>();
    		this.worker = new Thread(this);
    		this.worker.setDaemon(true);
    		this.worker.start();
    	}
    
    	public static UpdateKeeper getInstance() {
    		if (singleton == null) {
    			synchronized (UpdateKeeper.class) {
    				if (singleton == null) {
    					singleton = new UpdateKeeper();
    					return singleton;
    				}
    			}
    		}
    		return singleton;
    	}
    
    	public void register(UpdateJob filterFactory) {
    		this.filterFactorys.add(filterFactory);
    	}
    
    	@Override
    	public void run() {
    		while (true) {
    			try {
    				Thread.sleep(INTERVAL);
    			} catch (InterruptedException e) {
    				e.printStackTrace();
    			}
    
    			if (!this.filterFactorys.isEmpty()) {
    				for (UpdateJob factory : this.filterFactorys) {
    					try {
    						factory.update();
    					} catch (IOException e) {
    						e.printStackTrace();
    					}
    				}
    			}
    		}
    	}
    
    	public interface UpdateJob {
    
    		void update() throws IOException;
    	}
    }
    
  6. 为IKTokenizer添加一个构造函数
  7. 为Dictionary添加一个更新词典的方法
  8. 运行IKAnalzyerDemo的main方法和LuceneIndexAndSearchDemo 的main方法,结果正常,则配置成功。
  9. 在项目目录下,命令运行mvn clean install -Dmaven.test.skip=true,打成jar包。
  10. 把新项目打好的jar包复制到solr-6.6.4/server/solr-webapp/webapp/WEB-INF/lib文件夹下。
  11. solr-6.6.4/server/solr/demo_core/conf文件夹下创建ik.conf和my.dic。(注:ik.conf和my.dic必须时utf-8(无BOM)编码,否则IKAnalyzer识别乱码。)Ik.conf内容如下:
    lastUpdate=1
    files=my.dic
    (lastUpdate 最后一次修改批次,每次更新分词库需加1。files 自定义分词库文件地址,支持多个文件,中间以英文逗号给开。)
  12. 编辑managed-schema文件,在文本最后添加动态IK中文分词器的fieldType。
    <!-- 动态IK中文分词器 -->
    <fieldType name="text_ik_dm" class="solr.TextField">
    	<!--索引时候的分词器-->
    	<analyzer type="index">
    		<tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" conf="ik.conf"/>
    		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    	</analyzer>
    	<!--查询时候的分词器-->
    	<analyzer type="query">
    		<tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" conf="ik.conf"/>
    		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    	</analyzer>
    </fieldType>
    
  13. 停止Solr服务(solr stop -p 8983),删除原有的IKAnalyzer2012FF_u2.jar,再次启动Solr服务。
  14. 浏览器访问http://localhost:8983/solr,点击右侧Core Selector,选择demo_core -> Analysis,在FieldValue(Index) 中输入“帆布鞋”,在Select an Option中选择text_ik_dm,执行右侧的Analyse Values。
  15. 在solr-6.6.4/server/solr/demo_core/conf文件夹下,编辑my.dic,添加一个“鞋”分词。编辑ik.conf,lastUpdate属性值加1。一分钟后再次执行第6步。

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
说明:依赖jar包:lucene-core-2.3.2.jar、IKAnalyzer3.2.8.jar。 一、LuceneUtil 工具类代码: package com.zcm.lucene; import java.io.File; import java.io.IOException; import java.io.StringReader; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.wltea.analyzer.IKSegmentation; import org.wltea.analyzer.Lexeme; /** * Apache Lucene全文检索和IKAnalyzer分词工具类 * Company: 91注册码 * time:2014-04-22 * @author www.91zcm.com * @date * @version 1.1 */ public class LuceneUtil { /**索引创建的路径**/ private static String LucenePath = "d://index"; /** * 创建索引 * @throws Exception */ public static int createIndex(List list) throws Exception{ /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); /**注意最后一个boolean类型的参数:表示是否重新创建,true表示新创建(以前存在时回覆盖)**/ IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,true); for (int i = 0; i < list.size(); i++) { LuceneVO vo = (LuceneVO)list.get(i); Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); indexWriter.addDocument(doc); } /**查看IndexWriter里面有多少个索引**/ int num = indexWriter.docCount(); System.out.println("总共------》" + num); indexWriter.optimize(); indexWriter.close(); return num; } /** * IKAnalyzer分词 * @param word * @return * @throws IOException */ public static List tokenWord(String word) throws IOException{ List tokenArr = new ArrayList(); StringReader reader = new StringReader(word); /**当为true时,分词器进行最大词长切分**/ IKSegmentation ik = new IKSegmentation(reader, true); Lexeme lexeme = null; while ((lexeme = ik.next()) != null){ tokenArr.add(lexeme.getLexemeText()); } return tokenArr; } /** * 创建索引(单个) * @param list * @throws Exception */ public static void addIndex(LuceneVO vo) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer, false); /**增加document到索引去 **/ Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); indexWriter.addDocument(doc); /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(多个) * @param list * @throws Exception */ public static void addIndexs(List list) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**增加document到索引去 **/ for (int i=0; i<list.size();i++){ LuceneVO vo = (LuceneVO)list.get(i); Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); indexWriter.addDocument(doc); } /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 更新索引(单个) * @param list * @throws Exception */ public static void updateIndex(LuceneVO vo) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**增加document到索引去 **/ Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.updateDocument(term, doc); /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(多个) * @param list * @throws Exception */ public static void updateIndexs(List list) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**增加document到索引去 **/ for (int i=0; i<list.size();i++){ LuceneVO vo = (LuceneVO)list.get(i); Document doc = new Document(); Field FieldId = new Field("aid", String.valueOf(vo.getAid()),Field.Store.YES, Field.Index.NO); Field FieldTitle = new Field("title", vo.getTitle(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); Field FieldRemark = new Field("remark", vo.getRemark(), Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(FieldId); doc.add(FieldTitle); doc.add(FieldRemark); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.updateDocument(term, doc); } /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(单个) * @param list * @throws Exception */ public static void deleteIndex(LuceneVO vo) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.deleteDocuments(term); /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 创建索引(多个) * @param list * @throws Exception */ public static void deleteIndexs(List list) throws Exception { /**这里放索引文件的位置**/ File indexDir = new File(LucenePath); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,false); /**删除索引 **/ for (int i=0; i<list.size();i++){ LuceneVO vo = (LuceneVO)list.get(i); Term term = new Term("aid",String.valueOf(vo.getAid())); indexWriter.deleteDocuments(term); } /**optimize()方法是对索引进行优化 **/ indexWriter.optimize(); indexWriter.close(); } /** * 检索数据 * @param word * @return */ public static List search(String word) { List list = new ArrayList(); Hits hits = null; try { IndexSearcher searcher = new IndexSearcher(LucenePath); String[] queries = {word,word}; String[] fields = {"title", "remark"}; BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD}; Query query = MultiFieldQueryParser.parse(queries, fields, flags, new StandardAnalyzer()); if (searcher != null) { /**hits结果**/ hits = searcher.search(query); LuceneVO vo = null; for (int i = 0; i < hits.length(); i++) { Document doc = hits.doc(i); vo = new LuceneVO(); vo.setAid(Integer.parseInt(doc.get("aid"))); vo.setRemark(doc.get("remark")); vo.setTitle(doc.get("title")); list.add(vo); } } } catch (Exception ex) { ex.printStackTrace(); } return list; } } 二、Lucene用到的JavaBean代码: package com.zcm.lucene; /** * Apache Lucene全文检索用到的Bean * Company: 91注册码 * time:2014-04-22 * @author www.91zcm.com * @date * @version 1.1 */ public class LuceneVO { private Integer aid; /**文章ID**/ private String title; /**文章标题**/ private String remark; /**文章摘要**/ public Integer getAid() { return aid; } public void setAid(Integer aid) { this.aid = aid; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getRemark() { return remark; } public void setRemark(String remark) { this.remark = remark; } } 备注:源码来源于www.91zcm.com 开源博客中的全文检索代码。(http://www.91zcm.com/)

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值