基于全文检索引擎Lucene的一个小例子

最新推荐文章于 2024-09-22 19:27:48 发布

a13272899370

最新推荐文章于 2024-09-22 19:27:48 发布

阅读量1.1k

点赞数 1

分类专栏： lucene 文章标签： lucene 全文检索引擎 file exception date

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/a13272899370/article/details/6683031

版权

lucene 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Lucene是一个基于JAVA的全文搜索引擎，现在很多电子商务网站搜索功能使用了lucene，lucene的使用主要分为建立索引和在索引库搜索

Lucene包括下面主要的几个部分：

l 文档（document） - 一个网页、一个Word文档、一篇文章、一条数据库记录等等都可以看成是一个document对象

l 语言分析器（analysis） - 即分词器，把内容（document）按照某种规则分出里面包含的词语来，以便能够按照词语建立索引。Lucene中文分词器，比较有名的包括：paoding，imdict,mmseg4j,ik

l 索引（index） - 建立索引

l 搜索（search） - 根据索引进行搜索

l 查询分析器（queryParser） - 对查询字符串进行分析，比如：”Java – JavaME”

l 存储（store）- 主要集中于索引的存放，比如生成的索引既可以直接放在内存里，也可以放到磁盘的某个目录下面。

下面是基于Lucene的一个小例子：用于入门。本例并不支持中文的搜索

首先是建立索引：

public void testBuildIndex() throws Exception{

//指定索引文件存放位置

File file = new File("E:\\lucene\\buildIndex");

//要加入索引库的文件

File file2 = new File("E:\\lucene\\LuceneDemoSrc\\src\\cn\\itcast\\lucene");

//指定索引文件存放的目录

Directory dir = new SimpleFSDirectory(file);

//指定分词器

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);

//创建IndexWriter对象

/**

* 参数1：指定文件索引存放位置

* 参数2：指定分词器

* 参数3：优化索引字段最大长度

*/

IndexWriter indexWriter = new IndexWriter(dir,analyzer,MaxFieldLength.LIMITED);

indexDocs(file2,indexWriter);

//作一些优化

indexWriter.optimize();

//关闭IndexWriter对象

indexWriter.close();

}

private void indexDocs(File f,IndexWriter indexWriter) throws Exception{

Collection<File> files = FileUtils.listFiles(f,new String[]{"java"},true);

for(File file:files){

//每个文件创建一个Document对象

Document doc = createDocument(file);

//加入Document

indexWriter.addDocument(doc);

}

}

public Document createDocument(File f) throws FileNotFoundException{

SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");

//创建文档对象

Document doc = new Document();

//搜索f目录下的所有包含java的文件

/**

* 第一个参数：指定一个搜索目录

*第二个参数：指定文件名包含的字符

*第三个参数：为true表示搜索子目录

*/

//创建一个索引字段，

/**

* 第一个参数:表示索引字段名

* 第二个参数:表示指定要搜索的字符串

* 第三个参数：表示是否添加到索引文件中

* 第四个参数:表示是否作为索引字段

*/

Field field = new Field("path",f.getPath(),Field.Store.YES,Field.Index.ANALYZED);

doc.add(field);

FileReader fileReader = new FileReader(f);

/**

* 将文件内容作为索引

* 默认：

* 1、不添加到索引文件

* 2、将此字段作为索引字段

*/

Field field2 = new Field("content",fileReader);

doc.add(field2);

Date date = new Date(f.lastModified());

//将最后更新时间作为索引字段

Field field3 = new Field("lastUpdateTime",simpleDateFormat.format(date),Field.Store.YES,Field.Index.ANALYZED);

doc.add(field3);

return doc;

}

然后是在索引库中进行搜索最后得到结果：

public void testSearchIndex() throws IOException, ParseException{

File indexDir =new File("E:\\lucene\\buildIndex");

//指定分词器

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);

//指定搜索的目录

Directory dir = FSDirectory.open(indexDir);

//luncene搜索功能接口

/**

* 参数1：指定搜索的索引文件存放的目录

* 参数2：为true表示不允许修改，即只读

*/

IndexSearcher indexSrarcher = new IndexSearcher(dir,true);

//根据指定的字段查询

/**

* 参数一：指定luncene版本

* 参数二：表示指定查询的字段

* 参数三：表示指定分词器

*/

QueryParser parser = new QueryParser(Version.LUCENE_30,"path", analyzer);

//输入查询的字符串

Query query = parser.parse("lucene");

//搜索到得记录数

ScoreDoc[] hits = indexSrarcher.search(query, null, 1000).scoreDocs;

System.out.println("查询到得记录数为【"+hits.length+"】");

for(ScoreDoc scoreDoc:hits){

//通过文档标识搜索

Document doc =indexSrarcher.doc(scoreDoc.doc);

System.out.println("标题为-【"+doc.get("path")+"】最后更新时间为【"+doc.get("lastUpdateTime")+"】");

System.out.println("内容为-【"+doc.get("content"));

}

}

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。