Lucene3.6 之查询篇

最新推荐文章于 2021-06-16 09:41:37 发布

Ricky_Fung

最新推荐文章于 2021-06-16 09:41:37 发布

阅读量2.4k

点赞数

文章标签： lucene 全文搜索查询

本文链接：https://blog.csdn.net/top_code/article/details/8543146

版权

1、BooleanQuery

lucene3.6中BooleanQuery 实现与或的复合搜索
BooleanClause用于表示布尔查询子句关系的类，包括：BooleanClause.Occur.MUST，BooleanClause.Occur.MUST_NOT，BooleanClause.Occur.SHOULD。必须包含,不能包含,可以包含三种.有以下6种组合：

1．MUST和MUST：取得连个查询子句的交集。
2．MUST和MUST_NOT：表示查询结果中不能包含MUST_NOT所对应得查询子句的检索结果。
3．SHOULD与MUST_NOT：连用时，功能同MUST和MUST_NOT。
4．SHOULD与MUST连用时，结果为MUST子句的检索结果,但是SHOULD可影响排序。
5．SHOULD与SHOULD：表示“或”关系，最终检索结果为所有检索子句的并集。
6．MUST_NOT和MUST_NOT：无意义，检索无结果。

示例代码

public static void query(String path,String keyword,int size){
		
		try {
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);
			Analyzer analyzer = new IKAnalyzer();
//			Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);

			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			String[] fieldName = { "title", "category" }; 	// (在多个Filed中搜索)
			QueryParser queryParser = new MultiFieldQueryParser(
					Version.LUCENE_36, fieldName, analyzer);
			Query q1 = queryParser.parse(keyword);
			
			QueryParser parser = new QueryParser(Version.LUCENE_36, "author", analyzer);
			Query q2 = parser.parse("周伟明");
			
			BooleanQuery boolQuery = new BooleanQuery();
			boolQuery.add(q1, BooleanClause.Occur.MUST);
			boolQuery.add(q2,BooleanClause.Occur.MUST);

			ScoreDoc[] docs = searcher.search(boolQuery, null, size).scoreDocs;

			for (int i = 0; docs != null && i < docs.length; i++) {
				
				Document doc = searcher.doc(docs[i].doc);
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));

				Book book = new Book();

				book.setId(id);
				book.setTitle(title);
				book.setAuthor(author);
				book.setPublishTime(publishTime);
				book.setSource(source);
				book.setCategory(category);
				book.setReputation(reputation);
				
				System.out.println(book);
			}

			reader.close();
			searcher.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} catch (ParseException e) {
			e.printStackTrace();
		}
		
	}

2、TermQuery

词条查询，通过对某个词条的指定，实现检索索引中存在该词条的所有文档。

@Test
	public void testTermQuery(){
		try {
			String path = "D://LuceneEx/day02";
			String keyword = "android";
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);

			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			TermQuery query = new TermQuery(new Term("title", keyword));

			TopDocs tops = searcher.search(query, null, 50);

			int count = tops.totalHits;

			System.out.println("totalHits=" + count);

			ScoreDoc[] docs = tops.scoreDocs;

			for (int i = 0; i < docs.length; i++) {
				
				Document doc = searcher.doc(docs[i].doc);

				float score = docs[i].score;
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));

				System.out.println(id + "\t" + title + "\t" + author + "\t"
						+ publishTime + "\t" + source + "\t" + category + "\t"
						+ reputation+"\t"+score);
			}

			reader.close();
			searcher.close();

		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

3、TermRangeQuery

范围查询，这种范围可以是日期，时间，数字，大小等等。可以使用"context:[a to b]"（包含边界）或者"content:{a to b}"（不包含边界）查询表达式

示例代码

@Test
	public void testTermRangeQuery(){
		
		try {
			String path = "D://LuceneEx/day01";
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);
			
			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			String fieldName = "publishTime"; 	
			//查询出版日期在 "2011-04" 到 "2011-07" 之间的书籍
			TermRangeQuery tq = new TermRangeQuery(fieldName, "2011-04", "2011-07", false, true);

			TopDocs tops = searcher.search(tq, null, 10);
			int count = tops.totalHits;
			
			System.out.println("totalHits="+count);
			
			ScoreDoc[] docs = tops.scoreDocs;
			
			for(int i=0;i<docs.length;i++){
				Document doc = searcher.doc(docs[i].doc);
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));
				
				System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source);
			}
			
			reader.close();
			searcher.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

4、PrefixQuery

搜索以指定字符串开头的项的文档。当查询表达式中的短语以"*"结尾时，QueryParser的parse函数会为查询表达式项创建一个PrefixQuery对象。

示例代码

@Test
	public void testPrefixQuery(){
		
		try {
			String path = "D://LuceneEx/day01";
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);
			
			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			String fieldName = "source"; 	
			Term prefix = new Term(fieldName, "清华大学");
			PrefixQuery preq = new PrefixQuery(prefix );

			TopDocs tops = searcher.search(preq, null, 10);
			int count = tops.totalHits;
			
			System.out.println("totalHits="+count);
			
			ScoreDoc[] docs = tops.scoreDocs;
			
			for(int i=0;i<docs.length;i++){
				Document doc = searcher.doc(docs[i].doc);
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));
				
				System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source);
			}
			
			reader.close();
			searcher.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

5、PhraseQuery

短语查询，默认为完全匹配，但可以指定坡度（Slop，默认为0）改变范围。比如Slop=1，检索短语为“电台”，那么在“电台”中间有一个字的也可以被查找出来，比如“电视台”。查询表达式可以为“电台 ~1”

示例代码

@Test
	public void testPhraseQuery(){
		
		try {
			String path = "D://LuceneEx/day01";
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);
			
			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			String fieldName = "title"; 	
			PhraseQuery query = new PhraseQuery();
		    query.add(new Term(fieldName,"Lucene"));
		    query.add(new Term(fieldName,"入门"));
//		    query.setSlop(1);

			TopDocs tops = searcher.search(query, null, 50);
			int count = tops.totalHits;
			
			System.out.println("totalHits="+count);
			
			ScoreDoc[] docs = tops.scoreDocs;
			
			for(int i=0;i<docs.length;i++){
				Document doc = searcher.doc(docs[i].doc);
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));
				
				System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source);
			}
			
			reader.close();
			searcher.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

6、FuzzyQuery

模糊查询使用的匹配算法是levensh-itein算法。此算法在比较两个字符串时，将动作分为3种：加一个字母（Insert），删一个字母（Delete），改变一个字母（Substitute）。编辑距离能够影响结果的得分,编辑距离越小得分越高.查询表达式为"fuzzy~",使用~来表示模糊查询。

示例代码

@Test
	public void testFuzzyQuery(){
		
		try {
			String path = "D://LuceneEx/day01";
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);
			
			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			String fieldName = "category";
			Term term = new Term(fieldName, "云计算");
			FuzzyQuery query = new FuzzyQuery(term, 0.1f);
//			FuzzyQuery query = new FuzzyQuery(term, 0.1f,1);

			TopDocs tops = searcher.search(query, null, 50);
			int count = tops.totalHits;
			
			System.out.println("totalHits="+count);
			
			ScoreDoc[] docs = tops.scoreDocs;
			
			for(int i=0;i<docs.length;i++){
				Document doc = searcher.doc(docs[i].doc);
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));
				
				System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source+" "+category);
			}
			
			reader.close();
			searcher.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

7、WildcardQuery

通配符查询，“*”号表示0到多个字符，“？”表示单个字符。最好不要用通配符为首，否则会遍历所有索引项

@Test
	public void testWildcardQuery(){
		
		try {
			String path = "D://LuceneEx/day01";
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);
			
			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			String fieldName = "title";
			
			Term term = new Term(fieldName, "lucene*");
			
			WildcardQuery query = new WildcardQuery(term);

			TopDocs tops = searcher.search(query, null, 100);
			int count = tops.totalHits;
			
			System.out.println("totalHits="+count);
			
			ScoreDoc[] docs = tops.scoreDocs;
			
			for(int i=0;i<docs.length;i++){
				Document doc = searcher.doc(docs[i].doc);
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));
				
				System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source+" "+category);
			}
			
			reader.close();
			searcher.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

8、SpanQuery

SpanQuery：跨度查询。此类为抽象类。
SpanTermQuery：检索效果完全同TermQuery，但内部会记录一些位置信息，供SpanQuery的其它API使用，是其它属于SpanQuery的Query的基础。
SpanFirstQuery：查找方式为从Field的内容起始位置开始，在一个固定的宽度内查找所指定的词条。
SpanNearQuery：功能类似PharaseQuery，SpanNearQuery查找所匹配的不一定是短语，还有可能是另一个SpanQuery的查询结果作为整体考虑，进行嵌套查询。
SpanOrQuery：把所有SpanQuery查询结果综合起来，作为检索结果。
SpanNotQuery：从第一个SpanQuery查询结果中，去掉第二个SpanQuery查询结果，作为检索结果。

示例代码

@Test
	public void testSpanQuery(){
		
		try {
			String path = "D://LuceneEx/day01";
			File file = new File(path);
			Directory mdDirectory = FSDirectory.open(file);
			
			IndexReader reader = IndexReader.open(mdDirectory);

			IndexSearcher searcher = new IndexSearcher(reader);

			String fieldName = "title";
			
			Term t1=new Term(fieldName,"权威");
            Term t2=new Term(fieldName,"lucene");
            Term t3=new Term(fieldName,"搜索");
            Term t4=new Term(fieldName,"出版社");
            
            SpanTermQuery q1=new SpanTermQuery(t1);
            SpanTermQuery q2=new SpanTermQuery(t2);
            SpanTermQuery q3=new SpanTermQuery(t3);
            SpanTermQuery q4=new SpanTermQuery(t4);
            
            SpanNearQuery query1=new SpanNearQuery(new SpanQuery[]{q1,q2},1,false);
            SpanNearQuery query2=new SpanNearQuery(new SpanQuery[]{q3,q4},3,false);
            SpanNotQuery query = new SpanNotQuery(query1, query2);

//            Term t =new Term("content","mary");
//            SpanTermQuery people = new SpanTermQuery(t);
//            SpanFirstQuery query = new SpanFirstQuery(people,3);//3是宽度
            
			TopDocs tops = searcher.search(query, null, 100);
			int count = tops.totalHits;
			
			System.out.println("totalHits="+count);
			
			ScoreDoc[] docs = tops.scoreDocs;
			
			for(int i=0;i<docs.length;i++){
				Document doc = searcher.doc(docs[i].doc);
				
				int id = Integer.parseInt(doc.get("id"));
				String title = doc.get("title");
				String author = doc.get("author");
				String publishTime = doc.get("publishTime");
				String source = doc.get("source");
				String category = doc.get("category");
				float reputation = Float.parseFloat(doc.get("reputation"));
				
				System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source+" "+category);
			}
			
			reader.close();
			searcher.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

Ricky_Fung

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Lucene3.6 之查询篇

1、BooleanQuery lucene3.6中BooleanQuery 实现与或的复合搜索 BooleanClause用于表示布尔查询子句关系的类，包括：BooleanClause.Occur.MUST，BooleanClause.Occur.MUST_NOT，BooleanClause.Occur.SHOULD。必须包含,不能包含,可以包含三种.有以下6种组合： 1．MUS
复制链接

扫一扫