基于lucenen实现文档索引功能

最新推荐文章于 2022-05-07 23:58:40 发布

Rainly2000

最新推荐文章于 2022-05-07 23:58:40 发布

阅读量386

点赞数

分类专栏： lucenen 文章标签： lucene 搜索引擎全文检索

本文链接：https://blog.csdn.net/lyzchengxuyuan/article/details/122657915

版权

lucenen 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

lucenen是一个实现高性能、进行全文索引和搜索功能的开源库，它是搜索引擎领域的重要组成部分。

以下是关于如何使用lucene实现一个简单的文档索引的一个demo示例，基于lucenen4.x版本，code 如下:

public class Indexer {

    private IndexWriter indexWriter ;

    public static void main(String[] args) throws IOException {
         String indexDir = "/home/drainli/file/index" ;
         String dataDir = "/home/drainli/file" ;
        int numIndexed ;
        Indexer indexer = new Indexer(indexDir) ;
         try {
             numIndexed = indexer.index(dataDir,new TextFilter());
         }catch (Exception e){
             System.out.println("exception:" + e.getMessage());
             e.printStackTrace();
         }finally {
             indexer.close();
         }
    }

    public Indexer(String indexDir) throws IOException {
        Directory dir = FSDirectory.open(new File(indexDir));
        Analyzer analyzer = new StandardAnalyzer() ;
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_0,analyzer) ;
        indexWriter = new IndexWriter(dir,config);
    }

    private Document getDocument(File file) throws IOException {
        Document document = new Document();
        document.add(new TextField("文档",new FileReader(file)));
        document.add(new TextField("文件名",file.getName(),Field.Store.YES));
        document.add(new TextField("路径名",file.getCanonicalPath(),Field.Store.YES));

        return document ;
    }

    private int index(String dataDir, FileFilter fileFilter) throws IOException {
        File dataFile = new File(dataDir) ;
        File[] listFiles = dataFile.listFiles() ;
        for (File file : listFiles){
            if (!file.isDirectory()
            && file.canRead()
            && !file.isHidden()
            && file.exists()
            && (fileFilter == null || fileFilter.accept(file))){
                indexFile(file) ;
            }
        }
        return indexWriter.numDocs();
    }

    private void indexFile(File file) throws IOException {
        System.out.println("indexing file : " + file.getCanonicalPath());
        Document document = getDocument(file);
        indexWriter.addDocument(document);
    }

    private void close() throws IOException {
        indexWriter.close();
    }

}

class TextFilter implements FileFilter {

    @Override
    public boolean accept(File pathname) {
        String fileName = pathname.getName();
        return fileName.endsWith(".doc") || fileName.endsWith("docx") ;
    }

}

程序运行截图:

在这里插入图片描述

Rainly2000

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
基于lucenen实现文档索引功能

lucenen是一个实现高性能、进行全文索引和搜索功能的开源库，它是搜索引擎领域的重要组成部分。以下是关于如何使用lucene实现一个简单的文档索引的一个demo示例，基于lucenen4.x版本，code 如下:public class Indexer { private IndexWriter indexWriter ; public static void main(String[] args) throws IOException { String inde
复制链接

扫一扫

专栏目录