第一次接触Lucene.Net, 先从它自带的例子来看一下。
Lucene.Net主页:http://lucene.apache.org/lucene.net/
下载地址:https://svn.apache.org/repos/asf/lucene/lucene.net/site/download/
我下载的是Incubating-Apache-Lucene.Net-2.0-004-11Mar07.src.zip
下载完,先看了ReadMe, 没什么信息,只有Lucene.NET的介绍和目录介绍:
Apache Lucene.Net is a C# full-text search engine. Apache Lucene.Net is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications.
FILES
src/Lucene.Net
The Lucene source code. Lucene.Net源代码
src/Demo
Some example code. 代码示例,这是我想先看得
src/Test
Test code. 测试代码,暂时没什么用
contrib/*
Contributed code which extends and enhances Apache Lucene.Net, but is not part of the core library. 扩展代码,以后慢慢看
打开Demo目录, 打开Demo.sln, 用vs2010转了一下,发现编译出错,修改如下:
在编译设置里面把DemoLib项目勾上
再编译就通过了
一共有5个项目
DeleteFiles
DemoLib
IndexFiles
IndexHtml
SearchFiles
只有DemoLib是公共类库,其他都是Console程序
根据先后次序,先从IndexFiles开始学习,主要代码如下:
1: [STAThread]
2: public static void Main(System.String[] args)
3: {
4: try
5: {
6: IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true);
7: IndexDocs(writer, docDir);
8: writer.Optimize();
9: writer.Close();
10: }
11: catch (System.IO.IOException e)
12: {
13: }
14: }
15:
16: internal static void IndexDocs(IndexWriter writer, System.IO.FileInfo file)
17: {
18: // do not try to index files that cannot be read
19: // if (file.canRead()) // {{Aroush}} what is canRead() in C#?
20: {
21: if (System.IO.Directory.Exists(file.FullName))
22: {
23: System.String[] files = System.IO.Directory.GetFileSystemEntries(file.FullName);
24: // an IO error could occur
25: if (files != null)
26: {
27: for (int i = 0; i < files.Length; i++)
28: {
29: IndexDocs(writer, new System.IO.FileInfo(files[i]));
30: }
31: }
32: }
33: else
34: {
35: try
36: {
37: writer.AddDocument(FileDocument.Document(file));
38: }
39: // at least on windows, some temporary files raise this exception with an "access denied" message
40: // checking if the file can be read doesn't help
41: catch (System.IO.FileNotFoundException fnfe)
42: {
43: }
44: }
45: }
46: }
代码很简单,通过递归把文件夹下所有文件夹到IndexWriter中。
这里有几个API需要注意一下:
1. StandardAnalyzer
文档中的说明是 Filters {@link StandardTokenizer} with {@link StandardFilter}, {@link LowerCaseFilter} and {@link StopFilter}, using a list of English stop words.
不是很清楚是什么意思,不过应该是用来分词用的,这里明确说明是英语分词,中文分词应该用专门的analyzer
2. IndexWriter.AddDocument
把文档加入index,并把Index写入磁盘
3. IndexWriter.Optimize
If an index will not have more documents added for a while and optimal search performance is desired, then the optimize method should be called before the index is closed.
如果没什么新文档加入,就可以优化搜索的性能,具体怎样优化,以后在研究
4. IndexWriter.Close
关闭IndexWriter,应该是关闭对Index目录的锁,文档没有说明
因为刚学,说的比较浅,以后慢慢深入吧。