lucene入门

最新推荐文章于 2024-05-01 12:14:32 发布

chengye1942

最新推荐文章于 2024-05-01 12:14:32 发布

阅读量111

点赞数

原文链接：https://my.oschina.net/xzjl/blog/1524607

版权

1、什么是lucene

lucene是一个全文检索框架。

2、全文检索是什么

对需要查找数据的每个单词建立一个索引，对索引进行搜索的过程就叫全文检索。

3、为什么用全文检索

传统线性查找按照数据信息的前后顺序依次进行查找（效率低），当数据量大的时候速度很慢，通过索引内容快速找到需要的信息内容，类似书籍的目录（效率高）

4、lucene 5.2.1版本的实战运用

public class LuceneDemo {

    static Directory directory = null;
    static PerFieldAnalyzerWrapper analyzerWrapper;
    static String keyword = "感冒";

    public static void main(String[] args) throws Exception {
        init();
        createIndex();
        serachIndex();
    }

    public static void init() throws Exception {

        directory = new RAMDirectory();//内存索引库
        analyzerWrapper = AnalyzerHelper.getWriteAnalyzerWrapper();

        //输入字符串处理
        keyword = keyword.toLowerCase();

    }

    //建立索引
    public static void createIndex() throws Exception {
        // 需要建立索引目标数据
        Article article1 = new Article();
        article1.setId(10);
        article1.setTitle("感冒颗粒的好处");//标题
        article1.setContent("感冒颗粒可以治疗头疼，感冒颗粒还可以治疗咳嗽，目前市场上的感冒颗粒很多");//内容


        Article article2 = new Article();
        article2.setId(12);
        article2.setTitle("lucene入门");//标题
        article2.setContent("lucene是门全文检索框架");//内容

        List<Article> list = Lists.newArrayList();
        list.add(article1);
        list.add(article2);

        // 写入索引
        IndexWriterConfig iwConfig = new IndexWriterConfig(analyzerWrapper);
        iwConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
        IndexWriter iwriter = new IndexWriter(directory, iwConfig);
        // 将索引数据 转换 Document对象 （lucene要求）
        for (Article article : list) {
            Document document = new Document();
            document.add(new TextField("id",article.getId()+"",Field.Store.YES));
            document.add(new TextField("title", article.getTitle(), Field.Store.YES));
            document.add(new TextField("content", article.getContent(), Field.Store.YES));
            AnalyzerUtils.displayTokens(analyzerWrapper, "content", article.getContent());
            // 将document数据写入索引库
            iwriter.addDocument(document);
        }

        iwriter.commit();
        iwriter.close();
    }

    public static void serachIndex() throws ParseException, IOException {
        // 建立Query对象 根据内容查找
        QueryParser queryContent = new QueryParser("content",analyzerWrapper);
        Query q1 =queryContent.parse(QueryParser.escape(keyword));
        q1.setBoost(100f);
        IndexSearcher indexSearcher = new IndexSearcher(
                DirectoryReader.open(directory));

        // 执行查询获得满足结果前多少条记录
        TopDocs topDocs = indexSearcher.search(q1, 10);// 查询满足结果前10条数据
        System.out.println("满足结果记录条数：" + topDocs.totalHits);
        //获得每个结果
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            //获得Document下标
            int docId = scoreDoc.doc;
            Document document = indexSearcher.doc(docId);
            System.out.println("id:"+document.get("id"));
            System.out.println("title:"+document.get("title"));
            System.out.println("content:"+document.get("content"));
        }

    }
}

public class AnalyzerUtils {

    public static void displayTokens(Analyzer analyzer,String field,String text) throws Exception {
        TokenStream tokenStream = analyzer.tokenStream(field, text);
        displayTokens(tokenStream);
    }

    public static void displayTokens(TokenStream tokenStream) throws Exception {
        OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
        PositionIncrementAttribute positionIncrementAttribute = tokenStream.addAttribute(PositionIncrementAttribute.class);
        CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
        TypeAttribute typeAttribute = tokenStream.addAttribute(TypeAttribute.class);

        tokenStream.reset();
        int position = 0;
        try{
            while (tokenStream.incrementToken()) {
                int increment = positionIncrementAttribute.getPositionIncrement();
                if(increment > 0) {
                    position = position + increment;
                    System.out.print(position + ":");
                }
                int startOffset = offsetAttribute.startOffset();
                int endOffset = offsetAttribute.endOffset();
                String term = charTermAttribute.toString();
                System.out.println("[" + term + "]" + ":(" + startOffset + "-->" + endOffset + "):" + typeAttribute.type());
            }
        }catch (Exception e){
            throw e;
        }finally {
            tokenStream.close();
        }


    }
}

运行结果：

1:[感冒]:(0-->2):word
2:[颗粒]:(2-->4):word
3:[可以]:(4-->6):word
4:[治疗]:(6-->8):word
5:[头疼]:(8-->10):word
6:[感冒]:(11-->13):word
7:[颗粒]:(13-->15):word
8:[还可]:(15-->17):word
9:[可以]:(16-->18):word
10:[治疗]:(18-->20):word
11:[咳嗽]:(20-->22):word
12:[目前]:(23-->25):word
13:[市场]:(25-->27):word
14:[上]:(27-->28):word
15:[的]:(28-->29):word
16:[感冒]:(29-->31):word
17:[颗粒]:(31-->33):word
18:[很多]:(33-->35):word
1:[lucene]:(0-->6):letter
2:[是]:(6-->7):word
3:[门]:(7-->8):word
4:[全文]:(8-->10):word
5:[检索]:(10-->12):word
6:[框架]:(12-->14):word
满足结果记录条数：1
id:10
title:感冒颗粒的好处
content:感冒颗粒可以治疗头疼，感冒颗粒还可以治疗咳嗽，目前市场上的感冒颗粒很多