初识Lucene(2)


中文分析器IK
分析器的执行过程:

从reader字符流开始,创建一个基于reader的tokenizer分词器,经过三个tokenfilter生成tokerns.
其他分词器:StandardAnalyzer/CJKAnalyzer
SmartChineseAnalyzer
IK Analyze下载地址:https://code.google.com/archive/p/ik-analyzer/downloads
IK Analyze对中文分词的效果不错,而且可以用停用词/扩展字典来扩展.

        Analyzer analyzer = new IKAnalyzer();

        TokenStream tokenStream = analyzer.tokenStream("test",
                "需要测试的数据");
        CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
        OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
        tokenStream.reset();
        while (tokenStream.incrementToken()) {
            System.out.println("start->" + offsetAttribute.startOffset());
            System.out.println(charTermAttribute);
            System.out.println("end->" + offsetAttribute.endOffset());
        }
        tokenStream.close();

分析器的使用时机

进行搜索时,当输入让关键字域文档域内容包含词进行皮对时需要用分析器对域内容进行分析生成词汇单元.
先创建索引,将文档对象先放入索引库,然后对文档对象进行分词,再将分词term放入索引库.
搜索使用的分词器要与索引的分析器一致.


全删除

    Directory directory = FSDirectory.open(new File("索引库地址"));
            Analyzer analyzer = new StandardAnalyzer();// 官方推荐
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        IndexWriter indexWriter = new IndexWriter(directory, config);
                indexWriter.deleteAll();
        indexWriter.close();

根据条件删除

    Directory directory = FSDirectory.open(new File("索引库地址"));
            Analyzer analyzer = new StandardAnalyzer();// 官方推荐
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        IndexWriter indexWriter = new IndexWriter(directory, config);
                        Query query = new TermQuery(new Term("fileName","apache"));
        indexWriter.deleteDocuments(query);
        indexWriter.close();

修改:

        Document doc = new Document();
        doc.add(new TextField("file", "测试文件名",Store.YES));
//      doc.add(new TextField("fileC", "测试文件内容",Store.YES));
        indexWriter.updateDocument(new Term("fileName","lucene"), doc, new IKAnalyzer());
        indexWriter.close();

查询所有:

Query query = new MatchAllDocsQuery();
TopDocs topDocs = indexSearcher.search(query, 10);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc : scoreDocs) {}

按照数值范围:

    Query query = NumericRangeQuery.newLongRange("fileSize", 47L, 200L, false, true);

组合条件:

        BooleanQuery booleanQuery = new BooleanQuery();
        Query query1 = new TermQuery(new Term("fileName","apache"));
        Query query2 = new TermQuery(new Term("fileName","lucene"));
        //  select * from user where id =1 or name = 'safdsa'
        booleanQuery.add(query1, Occur.MUST);
        booleanQuery.add(query2, Occur.SHOULD);

组合查询条件:

        Query query1 = new TermQuery(new Term("fileName","apache"));
        Query query2 = new TermQuery(new Term("fileName","lucene"));
        //  select * from user where id =1 or name = 'safdsa'
        booleanQuery.add(query1, Occur.MUST);
        booleanQuery.add(query2, Occur.SHOULD);

条件解释的对象查询:

        QueryParser queryParser = new QueryParser("fileName",new IKAnalyzer());
        Query query = queryParser.parse("fileName:lucene is apache OR fileContent:lucene is apache");

多个默认域:

        String[] fields = {"fileName","fileContent"};
        MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields,new IKAnalyzer());
        Query query = queryParser.parse("lucene is apache");

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值