使用Lucene开发自己的搜索引擎–(3)indexer索引程序中基本类介绍

最新推荐文章于 2024-04-21 15:29:37 发布

w踏雪w

最新推荐文章于 2024-04-21 15:29:37 发布

阅读量1.8k

点赞数

分类专栏：搜索引擎文章标签： lucene Lucene

本文链接：https://blog.csdn.net/wen294299195/article/details/8579176

版权

搜索引擎专栏收录该内容

5 篇文章 0 订阅

订阅专栏

（1）Directory：

Directory类描述了Lucene索引的存放位置，它是一个抽象，其子类负责具体制定索引的存储路径。FSDirectory.open方法来获取真实文件在文件系统中的存储路径，然后将他们一次传递给IndexWriter类构造方法。

Directory dir = FSDirectory.open(new File(indexDir));

（2）IndexWriter：

负责创建新索引或者打开已有的索引，以及向索引中添加、删除或更新被索引文档的信息。

（3）Analyzer：

在文本文件被索引之前，需要经过Analyzer处理。Analyzer是由IndexWriter构造方法指定的，它负责从被索引文本文件中提取词汇单元，并剔除剩下的无用信息。

代码如下：

writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true,
				IndexWriter.MaxFieldLength.UNLIMITED);

（4）Document：

Document对象代表一些域（Field）的集合。可以理解为如web页面、文本文件等。

Document对象的结构比较简单，为一个包含多个Field对象的容器

（5）Field：

指包含能被缩影的文本内容的类。索引中每个文档都有一个或多个不同的域，这些域包含在Field类中。

每个域都有一个域名和对应的域值，以及一组选项来精确控制Lucene索引操作各个域值。

代码解释：

public Indexer(String indexDir)throws IOException{  
        Directory dir = FSDirectory.open(new File(indexDir));  
        /* 
         * Version.LUCENE_30:是版本号参数，Lucene会根据输入的版本值， 
         * 针对该值对应的版本进行环境和行为匹配 
         */  
        writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true,  
                IndexWriter.MaxFieldLength.UNLIMITED);  
    }

首先是Directory通过FSDirectory.open方法开辟一定空间来存储索引，并制定索引的存储路径。然后，创建IndexWriter对象来实现对索引文件的写入操作，如后面的：

writer.addDocument(doc)

添加索引操作。同时，在IndexWriter构造方法中，制定了Analyzer分析器。

protected Document getDocument(File f) throws Exception{
		Document doc = new Document();
		/**
		 * contents是域名， new FileReader(f)是域值
		 * filename是域名，f.getName是域值
		 * .......
		 */
		doc.add(new Field("contents", new FileReader(f)));//索引文件内容
		doc.add(new Field("filename", f.getName(),//索引文件名
				Field.Store.YES, Field.Index.NOT_ANALYZED));
		doc.add(new Field("fullpath", f.getCanonicalPath(),//索引文件完整路径
				Field.Store.YES, Field.Index.NOT_ANALYZED));
		
		return doc;
	}

每一个文本文件都会创建一个文档对象。

numIndexed = indexer.index(dataDir, new TextFilesFilter());

...........

//返回被索引文档文档数
	public int index(String dataDir, FileFilter filter)throws Exception{
		File[] files = new File(dataDir).listFiles();
		
		for(File f:files){
			if(!f.isDirectory() &&
					!f.isHidden()&&
					f.exists()&&
					f.canRead()&&
					(filter == null || filter.accept(f))){
				indexFile(f);
			}
		}
		return writer.numDocs();
	}

//向Lucene索引中添加文档
	private void indexFile(File f) throws Exception{
		System.out.println("Indexing "+f.getCanonicalPath());
		Document doc = getDocument(f);
		writer.addDocument(doc);
	}

将处理好的索引添加到index文件加下。

w踏雪w

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用Lucene开发自己的搜索引擎–(3)indexer索引程序中基本类介绍

（1）Directory：Directory类描述了Lucene索引的存放位置，它是一个抽象，其子类负责具体制定索引的存储路径。FSDirectory.open方法来获取真实文件在文件系统中的存储路径，然后将他们一次传递给IndexWriter类构造方法。Directory dir = FSDirectory.open(new File(indexDir));（2）IndexWri
复制链接

扫一扫