建立索引和查询系统

最新推荐文章于 2022-08-24 15:45:51 发布

wuwei35531

最新推荐文章于 2022-08-24 15:45:51 发布

阅读量274

点赞数

文章标签： null string file query path 文档

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/wuwei35531/article/details/7479270

版权

建立索引阶段同1，采用深度优先遍历找到mirror文件夹下的每个文件，并将其转换成Document对象，添加两个field：该文件的路径path和文件内容contents。

public Document getDocument(File file) {

BufferedReader br = null;

String line = null;

StringBuffer temp = new StringBuffer();

String content = null;

Document document = new Document();

document.add(new Field("path", file.getPath(), Field.Store.YES,

Field.Index.NOT_ANALYZED_NO_NORMS));

try {

br = new BufferedReader(new FileReader(file));

while((line = br.readLine()) != null) {

temp.append(line + "\n");

}

content = temp.toString();

} catch (FileNotFoundException e) {

e.printStackTrace();

} catch(IOException e) {

e.printStackTrace();

}

document.add(new Field("contents", content, Field.Store.YES, Field.Index.ANALYZED));

return document;

}

查询阶段，使用和建立索引阶段相同的分析器，对contents字段解析，查询某个字符串str，得到所有相关文档，并输出contents字段内容。

publicvoid search(String str) {

directory = FSDirectory.open(file);

searcher = new IndexSearcher(IndexReader.open(directory));

analyzer = new StandardAnalyzer(Version.LUCENE_35);

parser = new QueryParser(Version.LUCENE_35, "contents", analyzer);

query = parser.parse(str);

hits = searcher.search(query, 10);

System.out.println("hits' totalHits:" + hits.totalHits);

System.out.println("str: " + str);

ScoreDoc[] sd = hits.scoreDocs;

System.out.println("sd.length: " + sd.length);

for (int i = 0; i < sd.length; i++){

Document document = searcher.doc(sd[i].doc);

System.out.println("contents:" + document.get("contents"));

}

endSearchTime = System.currentTimeMillis();

} catch (IOException e) {

e.printStackTrace();

} catch(org.apache.lucene.queryParser.ParseException e) {

e.printStackTrace();

} finally {

try {

if(searcher != null) {

searcher.close();

searcher = null;

}

if(directory != null) {

directory.close();

directory = null;

}

} catch (IOException e) {

e.printStackTrace();

}

}

}

检索系统性能，采用系统运行所需时间作为评价指标，分别计算建立索引和查询各自所需时间，以及二者总时间。对网页建立索引，将网页的内容作为contents字段添加进文档，然后用pkusz搜索。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
建立索引和查询系统

建立索引阶段同1，采用深度优先遍历找到mirror文件夹下的每个文件，并将其转换成Document对象，添加两个field：该文件的路径path和文件内容contents。public Document getDocument(File file) { BufferedReader br = null; String line = null;
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。