Lucene 的相关操作问答

Lucene 的相关操作问答

一.      利用IK分词分析器索引:

Lucene 完全支持 IK ,只要new一个IKAnylizy,放进indexWtiterConf即可。

二.      利用luke查看索引

下载对应版本的luke,然后放进索引数据的目录,打开jar程序即可查看。

 

三.      删除,修改索引

从lucene4开始,删除的方法放在了indexwriter。具体查看api。而一般修改的方法是:先删除再重建。

四.      优化索引(优化限制索引文件数量的生成)

利用indexWriter.ForceMerge(int num)

五.      高亮显示

5.1 高亮显示要添加额外的jar包,api文档也在另外一个包

5.2 查看构造方法:

Highlighter(Formatter formatter, Scorer fragmentScorer)

可以知道这个highlight需要formattar,和scorer。其中formattar可以使用simpleHtmlFormattar,而scorer可以使用QueryScorer(Query query)

 

例子如下:

public  static  voidhighlightSearch(String indexPath) throwsException{
       File indexFile = newFile(indexPath);
       IndexReader ir = IndexReader.open(FSDirectory.open(indexFile));
       IndexSearcher is = new IndexSearcher(ir);
       IKAnalyzer ikAnalyzer = newIKAnalyzer(true);
       Query query = newTermQuery(new Term("content","圣诞节"));
       
       //高亮设置
       Formatter formatter = newSimpleHTMLFormatter("<B>", "</B>");
        QueryTermScorer scorer = newQueryTermScorer(query);
       Highlighter highlighter =newHighlighter(formatter, scorer);
       
       //查找id,search()方法只会返回前n个的基本信息(id,得分),得到id后再用is.doc()来精确查找
       
       TopDocs topdocs = is.search(query, 10);
        ScoreDoc[] docs =  topdocs.scoreDocs;
       int hits =topdocs.totalHits;
       System.out.println("total:"+hits);
       for(ScoreDoc doc : docs){
           int docID = doc.doc;
           //精确查找
           Document document = is.doc(docID);
           
           //对document的内容再次进行加工
           String title =highlighter.getBestFragment(ikAnalyzer, "title",document.get("title"));
           String content =highlighter.getBestFragment(ikAnalyzer, "content",document.get("content"));
           if(title ==null) title =document.get("title");
           if(content== null) content = document.get("content");
           System.out.println(document.get("id"));
           System.out.println(title);
           System.out.println(content);
       }
       
       
 }


六.      各种query解析

1 termQuery

A Query thatmatches documents containing a term. This may be combined with other terms witha BooleanQuery.

2 booleanQuery

A Query thatmatches documents containing a term. This may be combined with other terms witha BooleanQuery.

 

3 wildcardQuery

publicclass WildcardQuery

extendsAutomatonQuery

Implementsthe wildcard search query. Supported wildcards are *, which matches any character sequence(including the empty one), and ?, which matches any single character. '\' is the escape character.

Notethis query can be slow, as it needs to iterate over many terms. In order toprevent extremely slow WildcardQueries, a Wildcard term should not start withthe wildcard *

 

4 phraseQuery

A Query thatmatches documents containing a particular sequence of terms. A PhraseQuery isbuilt by QueryParser for input like "new york".

phraseQuery与termquery的区别是:phraseQuery.add(Term term,int position) 解析为:

publicvoid add(Term term,

                int position)

Adds a term to the end of the query phrase. The relativeposition of the term within the phrase is specified explicitly. This allowse.g. phrases with more than one term at the same position or phrases with gaps(e.g. in connection with stopwords).

意思是说短语与短语之间的间隔。

Setslop(int position);

 

5 MultiphraseQuery

MultiPhraseQueryis a generalized version of PhraseQuery, with an added method add(Term[]). To use this class, to search for the phrase"Microsoft app*" first use add(Term) on the term"Microsoft", then find all terms that have "app" as prefixusing IndexReader.terms(Term), and use MultiPhraseQuery.add(Term[] terms) toadd them to the query.

 

    6 FuzzyQuery

     7 RegexpQuery

正则表达式,只对索引词进行匹配。

 

 

 

 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值