建立索引阶段同1,采用深度优先遍历找到mirror文件夹下的每个文件,并将其转换成Document对象,添加两个field:该文件的路径path和文件内容contents。
public Document getDocument(File file) {
BufferedReader br = null;
String line = null;
StringBuffer temp = new StringBuffer();
String content = null;
Document document = new Document();
document.add(new Field("path", file.getPath(), Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS));
try {
br = new BufferedReader(new FileReader(file));
while((line = br.readLine()) != null) {
temp.append(line + "\n");
}
content = temp.toString();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch(IOException e) {
e.printStackTrace();
}
document.add(new Field("contents", content, Field.Store.YES, Field.Index.ANALYZED));
return document;
}
查询阶段,使用和建立索引阶段相同的分析器,对contents字段解析,查询某个字符串str,得到所有相关文档,并输出contents字段内容。
publicvoid search(String str) {
directory = FSDirectory.open(file);
searcher = new IndexSearcher(IndexReader.open(directory));
analyzer = new StandardAnalyzer(Version.LUCENE_35);
parser = new QueryParser(Version.LUCENE_35, "contents", analyzer);
query = parser.parse(str);
hits = searcher.search(query, 10);
System.out.println("hits' totalHits:" + hits.totalHits);
System.out.println("str: " + str);
ScoreDoc[] sd = hits.scoreDocs;
System.out.println("sd.length: " + sd.length);
for (int i = 0; i < sd.length; i++){
Document document = searcher.doc(sd[i].doc);
System.out.println("contents:" + document.get("contents"));
}
endSearchTime = System.currentTimeMillis();
} catch (IOException e) {
e.printStackTrace();
} catch(org.apache.lucene.queryParser.ParseException e) {
e.printStackTrace();
} finally {
try {
if(searcher != null) {
searcher.close();
searcher = null;
}
if(directory != null) {
directory.close();
directory = null;
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
检索系统性能,采用系统运行所需时间作为评价指标,分别计算建立索引和查询各自所需时间,以及二者总时间。对网页建立索引,将网页的内容作为contents字段添加进文档,然后用pkusz搜索。