Lucene4.10.3索引,使用iK分词

本人最喜欢使用IK分词,可能也是习惯问题吧。mess4j分词也还不错,个人喜好吧,下面简单分享下:

package buildindex;


import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.wltea.analyzer.lucene.IKAnalyzer;
import utils.ConstantUtil;


import java.io.*;


/**
 * Created with IntelliJ IDEA.
 * User: wxshi
 * Date: 15-2-5
 * Time: 下午8:02
 * To change this template use File | Settings | File Templates.
 */
public class MYIndex {
    private IndexWriter indexWriter = null;
    Analyzer analyzer = null;


    //构造函数主要获取IndexWriter对象
    public MYIndex(){
        try {
            //索引地址
            Directory indexDir = FSDirectory.open(new File(ConstantUtil.INDEX_STORE_PATH));
            //分词器使用IK分词器
            analyzer = new IKAnalyzer();
            IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_4_10_3,analyzer);
            iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
            indexWriter = new IndexWriter(indexDir,iwc);
        } catch (IOException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
    }


    //将文档写入索引  建索
    public void buildIndex(){


        try {
            indexWriter.deleteAll();
            File index_files = new File(ConstantUtil.INDEX_FILE_PATH);
            if(index_files.isDirectory()){
                String[] files = index_files.list();
                for(String file : files){
                    File indexFile = new File(index_files, file);
                    Document doc = getDocument(indexFile);//对文档进行路径和内容处理
                    System.out.println("正在建立索引 : " + file + "");
                    indexWriter.addDocument(doc);//构造索引
                }
                indexWriter.commit();
            }
        } catch (IOException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
    }


    //获取建索文档
    private Document getDocument(File file){
        try {
            Document doc = new Document();
            FileInputStream fis = new FileInputStream(file);
            Reader reader = new BufferedReader(new InputStreamReader(fis,"GBK"));
           // TokenStream tokenStream = new IKTokenizer(reader,false);     //ik分词流,不采用智能切分
            doc.add(new StringField("path" , file.getAbsolutePath(),Field.Store.YES));//添加文档路径
            doc.add(new StringField("title" , file.getName(),Field.Store.YES));
            doc.add(new TextField("content",reader));                  //文本内容,默认不存储
            doc.add(new LongField("size",file.length(), Field.Store.YES));
            return doc;
        } catch (Exception e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
        return null;
    }


    //关闭流
    public void close(){
        try {
            indexWriter.close();
        } catch (IOException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
    }
}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值