LUCENCE基础应用

jaytse

于 2008-07-25 16:13:00 发布

阅读量677

点赞数

分类专栏： J2SE 文章标签： exception string file path class query

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/jaytse/article/details/2710610

版权

J2SE 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

rel="File-List" href="file:///C:%5CDOCUME%7E1%5CADMINI%7E1%5CLOCALS%7E1%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml">

一、建立索引部分

XMLFilesIndexer.java

import java.io.File;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.index.IndexWriter;

public class XMLFilesIndexer {

public void createIndex(String indexFileDir, String sourceDir) {

try {

IndexWriter writer = new IndexWriter(indexFileDir,

new StandardAnalyzer(), true);

System.out.println("Indexing to directory '" + indexFileDir

+ "'...");

indexDocs(writer, new File(sourceDir));

System.out.println("Optimizing...");

writer.optimize();

writer.close();

} catch (Exception e) {

e.printStackTrace();

}

}

void indexDocs(IndexWriter writer, File file) throws IOException {

if (file.canRead()) {

if (file.isDirectory()) {

String[] files = file.list();

if (files != null) {

for (int i = 0; i < files.length; i++) {

indexDocs(writer, new File(file, files[i]));

}

}

} else {

if (file.getName().endsWith("xml")) {

System.out.println("adding " + file);

try { writer.addDocument(XMLDocument.documetnByNode(file));

} catch (Exception e) {

e.printStackTrace();

}

}

}

}

}

}

二、检索部分

XMLFilesSearcher.java

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.search.*;

public class XMLFilesSearcher {

public void search(String keyword, String indexDir)

throws Exception {

String field = "contents";

IndexReader reader = IndexReader.open(indexDir);

Searcher searcher = new IndexSearcher(reader);

Analyzer analyzer = new StandardAnalyzer();

QueryParser parser = new QueryParser(field, analyzer);

Query query = parser.parse(keyword);

Hits hits = searcher.search(query);

for (int i = 0; i < hits.length(); i++) {

Document doc = hits.doc(i);

String path = doc.get("path");

if (path != null) {

System.out.println((i + 1) + ". " + path);

} else {

System.out.println((i + 1) + ". " + "No path for this document");

}

}

reader.close();

}

}

三、应用部分

public class Main {

public static void main(String[] args) {

// XMLFilesIndexer xmlFilesIndexer = new XMLFilesIndexer();

// xmlFilesIndexer.createIndex("Data//index", "Data//data");

XMLFilesSearcher xmlFilesSearcher = new XMLFilesSearcher();

try {

xmlFilesSearcher.search("钱 OR 陈", "Data//index");

} catch (Exception e) {

e.printStackTrace();

}

}

}

XML辅助类:XMLDocument

import java.io.CharArrayReader;

import java.io.File;

import java.io.FileReader;

import nu.xom.Builder;

import nu.xom.Nodes;

import nu.xom.Document;

import nux.xom.pool.XQueryFactory;

import nux.xom.xquery.XQuery;

import org.apache.lucene.document.Field;

public class XMLDocument {

public static org.apache.lucene.document.Document documentByFile(File file){

org.apache.lucene.document.Document doc = new org.apache.lucene.document.Document();

try{

doc.add(new Field("path", file.getPath(), Field.Store.YES, Field.Index.UN_TOKENIZED));

doc.add(new Field("contents", new FileReader(file)));

}catch(Exception e){

e.printStackTrace();

}

return doc;

}

}

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LUCENCE基础应用

Normal 0 7.8 磅 0 2 false false false MicrosoftInternetExplorer4 <object classi
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。