当初说要写写lucene和solr的学习笔记,写了两个后就懒得写了。最近想做个lucene和solr的中文学习网站,翻译一些lucene和solr的英文资料,并提供一个中文的交流学习平台。所以想把这个系列继续下去。
言归正传,上面说到我们的目标是学习和修改lucene/solr的源代码。不过如果我们从没有用过,那是不可能读懂源代码的。这里推荐《lucene in action》第二版,中文版也有,网上能够下载到英文版的,建议阅读英文版。这本书的第一作者Michael McCandless是现在Lucene PMC的成员,他的blog可以关注一下
http://blog.mikemccandless.com,另外两个作者是本书第一版的作者,好像现在在lucene和solr的开发中不是特别活跃,在lucene和solr的邮件列表中没怎么见过。
下面我们简单的学习(或者复习)一下Lucene的建索引过程,我们将给出lucene 2.x/3.x 和 最新trunk正在开发的4.0的建立索引的方法,尤其是它们的区别。
Lucene 2.x/3.x里建立索引并进行简单搜索的例子
Directory dir=FSDirectory.open(new File("./testindex"));
for(String fn:dir.listAll()){
dir.deleteFile(fn);
}
IndexWriter writer=new IndexWriter(dir,new WhitespaceAnalyzer(Version.LUCENE_36),IndexWriter.MaxFieldLength.UNLIMITED);
Document doc=new Document();
doc.add(new Field("id","0001",Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("body","hello world, this is text body part. ",Field.Store.NO,Field.Index.ANALYZED));
doc.add(new Field("clickCount","10",Field.Store.YES,Field.Index.NO));
writer.addDocument(doc);
doc=new Document();
doc.add(new Field("id","0002",Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("body","good bye. that is it. ",Field.Store.NO,Field.Index.ANALYZED));
doc.add(new Field("clickCount","3",Field.Store.YES,Field.Index.NO));
writer.addDocument(doc);
writer.close();
IndexReader reader=IndexReader.open(dir);
IndexSearcher searcher=new IndexSearcher(reader);
Query q=new TermQuery(new Term("body","is"));
TopDocs docs=searcher.search(q, 10);
for(int i=0;i<docs.totalHits;i++){
int docId=docs.scoreDocs[i].doc;
float score=docs.scoreDocs[i].score;
doc=searcher.doc(docId);
System.out.println("id="+doc.get("id")+", clickcount="+doc.get("clickCount"));
}
reader.close();
Lucene 4 里建立索引并进行简单搜索的例子
Directory dir=FSDirectory.open(new File("./testindex"));
IndexWriterConfig cfg=new IndexWriterConfig(Version.LUCENE_40,new WhitespaceAnalyzer(Version.LUCENE_40));
cfg.setOpenMode(OpenMode.CREATE);
IndexWriter writer=new IndexWriter(dir,cfg);
Document doc=new Document();
doc.add(new Field("id","0001",StringField.TYPE_STORED));
doc.add(new TextField("body","hello world, this is text body part. "));
doc.add(new DocValuesField("clickcount",10,DocValues.Type.FIXED_INTS_8));
writer.addDocument(doc);
doc=new Document();
doc.add(new Field("id","0002",StringField.TYPE_STORED));
doc.add(new TextField("body","good bye. that is it. "));
doc.add(new DocValuesField("clickcount",3,DocValues.Type.FIXED_INTS_8));
writer.addDocument(doc);
writer.close();
IndexReader reader=DirectoryReader.open(dir);
IndexSearcher searcher=new IndexSearcher(reader);
Query q=new TermQuery(new Term("body","i