Lucene.Net 初学笔记 - 介绍

最新推荐文章于 2021-01-27 06:56:30 发布

weixin_33972649

最新推荐文章于 2021-01-27 06:56:30 发布

阅读量78

点赞数

文章标签： c#

原文链接：http://www.cnblogs.com/hongyes/archive/2010/07/19/1780994.html

版权

第一次接触Lucene.Net, 先从它自带的例子来看一下。

Lucene.Net主页：http://lucene.apache.org/lucene.net/

下载地址：https://svn.apache.org/repos/asf/lucene/lucene.net/site/download/

我下载的是Incubating-Apache-Lucene.Net-2.0-004-11Mar07.src.zip

下载完，先看了ReadMe, 没什么信息，只有Lucene.NET的介绍和目录介绍：

Apache Lucene.Net is a C# full-text search engine. Apache Lucene.Net is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications.

FILES

src/Lucene.Net
The Lucene source code. Lucene.Net源代码

src/Demo
Some example code. 代码示例，这是我想先看得

src/Test
Test code. 测试代码，暂时没什么用

contrib/*
Contributed code which extends and enhances Apache Lucene.Net, but is not part of the core library. 扩展代码，以后慢慢看

打开Demo目录，打开Demo.sln，用vs2010转了一下，发现编译出错，修改如下：

在编译设置里面把DemoLib项目勾上

再编译就通过了

一共有5个项目

DeleteFiles

DemoLib

IndexFiles

IndexHtml

SearchFiles

只有DemoLib是公共类库，其他都是Console程序

根据先后次序，先从IndexFiles开始学习，主要代码如下：

  1: [STAThread]

  2: public static void  Main(System.String[] args)

  3: {

  4: try

  5: {

  6: IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true);

  7: IndexDocs(writer, docDir);

  8: writer.Optimize();

  9: writer.Close();

 10: }

 11: catch (System.IO.IOException e)

 12: {

 13: }

 14: }

15:

 16: internal static void  IndexDocs(IndexWriter writer, System.IO.FileInfo file)

 17: {

 18: // do not try to index files that cannot be read

 19: // if (file.canRead())  // {{Aroush}} what is canRead() in C#?

 20: {

 21: if (System.IO.Directory.Exists(file.FullName))

 22: {

 23: System.String[] files = System.IO.Directory.GetFileSystemEntries(file.FullName);

 24: // an IO error could occur

 25: if (files != null)

 26: {

 27: for (int i = 0; i < files.Length; i++)

 28: {

 29: IndexDocs(writer, new System.IO.FileInfo(files[i]));

 30: }

 31: }

 32: }

 33: else

 34: {

 35: try

 36: {

 37: writer.AddDocument(FileDocument.Document(file));

 38: }

 39: // at least on windows, some temporary files raise this exception with an "access denied" message

 40: // checking if the file can be read doesn't help

 41: catch (System.IO.FileNotFoundException fnfe)

 42: {

 43: }

 44: }

 45: }

 46: }

代码很简单，通过递归把文件夹下所有文件夹到IndexWriter中。

这里有几个API需要注意一下：

1. StandardAnalyzer

文档中的说明是 Filters {@link StandardTokenizer} with {@link StandardFilter}, {@link LowerCaseFilter} and {@link StopFilter}, using a list of English stop words.

不是很清楚是什么意思，不过应该是用来分词用的，这里明确说明是英语分词，中文分词应该用专门的analyzer

2. IndexWriter.AddDocument

把文档加入index，并把Index写入磁盘

3. IndexWriter.Optimize

If an index will not have more documents added for a while and optimal search performance is desired, then the optimize method should be called before the index is closed.

如果没什么新文档加入，就可以优化搜索的性能，具体怎样优化，以后在研究

4. IndexWriter.Close

关闭IndexWriter，应该是关闭对Index目录的锁，文档没有说明

因为刚学，说的比较浅，以后慢慢深入吧。

转载于:https://www.cnblogs.com/hongyes/archive/2010/07/19/1780994.html

weixin_33972649

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lucene.Net 初学笔记 - 介绍

第一次接触Lucene.Net, 先从它自带的例子来看一下。 Lucene.Net主页：http://lucene.apache.org/lucene.net/ 下载地址：https://svn.apache.org/repos/asf/lucene/lucene.net/site/download/ 我下载的是Incubating-Apache-Lucene.Net-2.0-004-11Mar0...
复制链接

扫一扫