Lucene 学习笔记 01 —— Lucene 的使用简介和开发步骤

最新推荐文章于 2024-07-03 19:13:37 发布

豆包不在豆子在

最新推荐文章于 2024-07-03 19:13:37 发布

阅读量881

点赞数

分类专栏： Lucene学习笔记文章标签： java lucene

本文链接：https://blog.csdn.net/Free_Dou/article/details/23126161

版权

Lucene学习笔记专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Lucene的使用：
在全文索引工具中，都是由三部分组成：
1.索引部分
2.分词部分
3.搜索部分

Lucene的开发步骤如下，一开始不理解没关系，知道大概的步骤就行了，我会附上一个小的Dome示例代码

1.创建一个Java Project

2.导入所需的jar包，core.jar包必须导入，junit也要导入，这样才能进行单元测试
3.完成创建索引部分
a.创建Directory
b.创建IndexWriter
c.创建Document对象
d.为Document对象添加Field

e.通过IndexWriter添加文档到索引中

// 模拟数据
	private String[] ids = { "1", "2", "3", "4", "5", "6" };
	private String[] emails = { "aa@itat.org", "bb@itat.org", "cc@cc.org",
			"dd@sina.org", "ee@zttc.edu.org", "ff@itat.org" };

	private String[] content = { "welecome to wisited the space I like dog",
			"Hello boy，do you like me", "my name is cc，I like music",
			"I like football,I like you", "I like basketball",
			"I like movie and swimming" };
	private int[] attachs = { 2, 3, 1, 4, 5, 5 };
	private String[] names = { "zhangsan", "lisi", "john", "jetty", "mike",
			"jake" };
	private Directory directory = null;
	public Index_Util() {
		try {
			directory = FSDirectory.open(new File("F:/Lucene/index02"));
		} catch (IOException e) {
			e.printStackTrace();
		}
	}


public void index() {
		Document doc = null;
		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_35,
				new StandardAnalyzer(Version.LUCENE_35));
		IndexWriter writer = null;
		try {
			writer = new IndexWriter(directory, iwc);
			writer.deleteAll();
			// 添加document
			for (int i = 0; i < ids.length; i++) {
				doc = new Document();
				doc.add(new Field("ids", ids[i], Store.YES,
						Index.NOT_ANALYZED_NO_NORMS));
				doc.add(new Field("emails", emails[i], Store.YES,
						Index.NOT_ANALYZED));
				doc.add(new Field("content", content[i], Store.NO,
						Index.ANALYZED));
				doc.add(new Field("names", names[i], Store.YES,
						Index.NOT_ANALYZED_NO_NORMS));
				// 索引数字和日期
				doc.add(new NumericField("attachs", Store.YES, false)
						.setIntValue(attachs[i]));
				doc.add(new NumericField("date", Store.YES, true)
						.setLongValue(dates[i].getTime()));

				// 提取加权邮箱后缀，进行加权处理
				String et = emails[i].substring(emails[i].lastIndexOf("@") + 1);
				// 如果HashMap中包含此后缀
				if (scores.containsKey(et)) {
					doc.setBoost(scores.get(et));
				} else {
					doc.setBoost(0.5f);
				}

				writer.addDocument(doc);
			}

		} catch (CorruptIndexException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
			// 关闭writer
			if (writer != null) {
				try {
					writer.close();
				} catch (CorruptIndexException e) {
					e.printStackTrace();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
	}

Field的参数中：

Field.store.Yes 或 NO （存储域选项）：
设置为Yes表示把这个域中的内容完全存储到文件中，方便还原；
设置为No 表示把这个域的内容不存储到文件中，但是可以被索引，此时内容无法完全还原，无法使用doc.get还原；

Field.Index （索引选项）：
Index.Analyzed ：进行分词和索引，适用于标题内容等；
Index.NOT_ANALYZED ：进行索引，但是不进行分词；
Index.ANALYZED_NOT_NORMS:进行分词但是不存储norms信息，这个norms中包括了创建索引的时间和权值等信息；
Index.NOT_ANALYZED_NOT_NORMS：即不进行分词也不进行存储norms信息，如身份证号、姓名等，适用于精确搜索，没有加权的意义；
Index.NO ：不进行索引；

创建文档并且添加索引，可以这样理解：
文档相当于表中的每一条记录；
域相当于表中的每一个字段；
先创建文档，再添加域。

content存储方法：
可以使用commons-io.jar包中的方法，通过输入流将文件读进来，转换成String，再存储处理
String content = FileUtils.readFileToString(file);

4.完成搜索部分
a.创建Directory
b.创建IndexReader
c.根据IndexReader创建IndexSearcher
d.创建搜索的Query
e.根据searcher搜索并返回TopDocs
f.根据TopDocs获取ScoreDoc对象
g.根据searcher和ScoreDoc对象获取具体的Document对象
h.根据Document对象获取需要的值
i.关闭reader

public void search() {
		IndexReader reader = null;
		try {
			reader = IndexReader.open(directory);

			IndexSearcher searcher = new IndexSearcher(reader);
			// 使用TermQuary进行精确查找
			TermQuery termQuery = new TermQuery(new Term("content", "like"));
			TopDocs scores = searcher.search(termQuery, 10);

			for (ScoreDoc sd : scores.scoreDocs) {
				Document doc = searcher.doc(sd.doc);

				//获取日期并转换
				String dt = doc.get("date");
				SimpleDateFormat date = new SimpleDateFormat("yyyy-MM-dd");
				String dt2 = date.format(Long.parseLong(dt));
				//System.out.println(dt2);
				
				System.out.println("DocNo---->" + sd.doc + "    name---->"
						+ doc.get("names") + "    email----->"
						+ doc.get("emails") + "   attachs---->"
						+ doc.get("attachs")+"   date---->"+dt2);
	
				
			}

		} catch (CorruptIndexException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
			if (reader != null) {
				try {
					reader.close();
				} catch (CorruptIndexException e) {
					e.printStackTrace();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}

	}

豆包不在豆子在

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lucene 学习笔记 01 —— Lucene 的使用简介和开发步骤

Lucene 的开发步骤（JAVA project）1.创建一个Java Project2.导入所需的jar包，core.jar包必须导入，junit也可以导入3.完成创建索引部分 a.创建Directory b.创建IndexWriter c.创建Document对象 d.为Document对象添加Field e.通过IndexWrit
复制链接

扫一扫