[Lucene]#1_Lucene入门

最新推荐文章于 2021-06-25 15:40:07 发布

PeppaKing

最新推荐文章于 2021-06-25 15:40:07 发布

阅读量110

点赞数

分类专栏： j2ee 文章标签： Lucene

本文链接：https://blog.csdn.net/qq_30782921/article/details/92801182

版权

j2ee 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

1.创建索引

创建java工程，添加jar：lucene-analyzers-common-7.4.0.jar, lucene-core-7.4.0.jar, commons-io.jar
LuceneFirst.java

//创建索引
@Test
public void createIndex() throws Exception {

   //指定索引库存放的路径
   //D:\temp\index
   Directory directory = FSDirectory.open(new File("./index").toPath());
   //索引库还可以存放到内存中
   //Directory directory = new RAMDirectory();
   //创建indexwriterCofig对象
   IndexWriterConfig config = new IndexWriterConfig();
   //创建indexwriter对象
   IndexWriter indexWriter = new IndexWriter(directory, config);
   //原始文档的路径
   File dir = new File("./searchsource");
   for (File f : dir.listFiles()) {
       //文件名
       String fileName = f.getName();
       //文件内容
       String fileContent = FileUtils.readFileToString(f);
       //文件路径
       String filePath = f.getPath();
       //文件的大小
       long fileSize  = FileUtils.sizeOf(f);
       //创建文件名域
       //第一个参数：域的名称
       //第二个参数：域的内容
       //第三个参数：是否存储
       Field fileNameField = new TextField("filename", fileName, Field.Store.YES);
       //文件内容域
       Field fileContentField = new TextField("content", fileContent, Field.Store.YES);
       //文件路径域（不分析、不索引、只存储）
       Field filePathField = new TextField("path", filePath, Field.Store.YES);
       //文件大小域
       Field fileSizeField = new TextField("size", fileSize + "", Field.Store.YES);

       //创建document对象
       Document document = new Document();
       document.add(fileNameField);
       document.add(fileContentField);
       document.add(filePathField);
       document.add(fileSizeField);
       //创建索引，并写入索引库
       indexWriter.addDocument(document);
   }
   //关闭indexwriter
   indexWriter.close();
}

2.使用luke查看索引库内容

3.查询索引库

//查询索引库
@Test
public void searchIndex() throws Exception {
   //指定索引库存放的路径
   Directory directory = FSDirectory.open(new File("./index").toPath());
   //创建indexReader对象
   IndexReader indexReader = DirectoryReader.open(directory);
   //创建indexsearcher对象
   IndexSearcher indexSearcher = new IndexSearcher(indexReader);
   //创建查询
   Query query = new TermQuery(new Term("filename", "apache"));
   //执行查询
   //第一个参数是查询对象，第二个参数是查询结果返回的最大值
   TopDocs topDocs = indexSearcher.search(query, 10);
   //查询结果的总条数
   System.out.println("查询结果的总条数："+ topDocs.totalHits);
   //遍历查询结果
   //topDocs.scoreDocs存储了document对象的id
   for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
       //scoreDoc.doc属性就是document对象的id
       //根据document的id找到document对象
       Document document = indexSearcher.doc(scoreDoc.doc);
       System.out.println(document.get("filename"));
       //System.out.println(document.get("content"));
       System.out.println(document.get("path"));
       System.out.println(document.get("size"));
       System.out.println("-------------------------");
   }
   //关闭indexreader对象
   indexReader.close();
}

4.分析器分析效果

//查看标准分析器的分词效果
@Test
public void testTokenStream() throws Exception {
   //创建一个标准分析器对象
   Analyzer analyzer = new StandardAnalyzer();
   //获得tokenStream对象
   //第一个参数：域名，可以随便给一个
   //第二个参数：要分析的文本内容
   TokenStream tokenStream = analyzer.tokenStream("test", "The Spring Framework provides a comprehensive programming and configuration model.");
   //添加一个引用，可以获得每个关键词
   CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
   //添加一个偏移量的引用，记录了关键词的开始位置以及结束位置
   OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
   //将指针调整到列表的头部
   tokenStream.reset();
   //遍历关键词列表，通过incrementToken方法判断列表是否结束
   while(tokenStream.incrementToken()) {
       //关键词的起始位置
       System.out.println("start->" + offsetAttribute.startOffset());
       //取关键词
       System.out.println(charTermAttribute);
       //结束位置
       System.out.println("end->" + offsetAttribute.endOffset());
   }
   tokenStream.close();
}

5.中文分析器

IK-Analyzer-1.0-SNAPSHOT.jar 加入lib
hotword.dic,IKAnalyzer.cfg.xml,stopword.dic 加入src

   @Test
   public void testIKTokenStream() throws Exception {
       //建一个标准分析器对象
       Analyzer analyzer = new IKAnalyzer();
       //获得tokenStream对象
       //第一个参数：域名，可以随便给一个
       //第二个参数：要分析的文本内容
       TokenStream tokenStream = analyzer.tokenStream("test", "大家好，我是渣渣灰");
       //添加一个引用，可以获得每个关键词
       CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
       //添加一个偏移量的引用，记录了关键词的开始位置以及结束位置
       OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
       //将指针调整到列表的头部
       tokenStream.reset();
       //遍历关键词列表，通过incrementToken方法判断列表是否结束
       while(tokenStream.incrementToken()) {
           //关键词的起始位置
           System.out.println("start->" + offsetAttribute.startOffset());
           //取关键词
           System.out.println(charTermAttribute);
           //结束位置
           System.out.println("end->" + offsetAttribute.endOffset());
       }
       tokenStream.close();
   }

6.代码中使用分析器

@Test
    public void createIndex() throws Exception{

        //1.创建一个Directory对象，指定索引库保存位置
        Directory directory = FSDirectory.open(new File("./index").toPath());


        //2.基于directory对象创建一个IndexWriter对象
        IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
        IndexWriter indexWriter = new IndexWriter(directory,config);


        //3.读取磁盘文件，每个文件创建一个文档对象
        File dir = new File("./searchsource");
        File[] files = dir.listFiles();
        for (File f:
             files) {
            String fileName = f.getName();
            String filePath = f.getPath();
            String fileContent = FileUtils.readFileToString(f,"utf-8");
            long fileSize = FileUtils.sizeOf(f);

            Field fieldName = new TextField("name",fileName,Field.Store.YES);
            Field fieldPath = new TextField("path",filePath,Field.Store.YES);
            Field fieldContent = new TextField("content",fileContent,Field.Store.YES);
            Field fieldSize = new TextField("size",fileSize+"",Field.Store.YES);

            Document document = new Document();
            document.add(fieldName);
            document.add(fieldPath);
            document.add(fieldContent);
            document.add(fieldSize);

            //5.文档对象写入索引库
            indexWriter.addDocument(document);

        }

        //6.关闭indexWriter
        indexWriter.close();
    }

PeppaKing

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[Lucene]#1_Lucene入门

1.创建索引创建java工程，添加jar：lucene-analyzers-common-7.4.0.jar, lucene-core-7.4.0.jar, commons-io.jarLuceneFirst.java//创建索引@Testpublic void createIndex() throws Exception { //指定索引库存放的路径 //D:\tem...
复制链接

扫一扫