Lucene实现全文检索简单例子

最新推荐文章于 2022-08-29 19:56:57 发布

从哪里跌倒，就在哪里躺下

最新推荐文章于 2022-08-29 19:56:57 发布

阅读量420

点赞数

分类专栏：框架文章标签： Lucene

本文链接：https://blog.csdn.net/weixin_42812598/article/details/94498267

版权

框架专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Lucene是基于java的一个api工具包，也是最原始的全文检索实现方式，现在流行的es其底层其实就是基于Lucene的。
如何使用Lucene实现全文检索呢？
创建索引7步走：
1、指定索存放位置也就是索引的存放位置
2、创建一个自定义的分词器，因为默认的分词器是不支持中文词的
3、创建indexWriterConfig(参数分词器)
4、创建IndexWriter
5、创建Document 对象
6、创建索引，并写入索引库
7、关闭indexWriter，用完了就要关闭
关键词查询索引7步走：
1、指定索存放位置
2、创建indexReader对象
3、创建indexSearcher对象
4、创建查询
5、执行查询
6、遍历查询结果
7、关闭indexReader
长句查询索引（根据长句其实就是先将长句进行分词然后再去匹配）：
1、指定索存放位置
2、创建indexReader对象
3、创建indexSearcher对象
4、创建分词器
5、查询生成器（解析输入生成Query查询对象）
6、通过parse解析输入（分词），生成query对象
7、执行查询
8、遍历查询结果
9、关闭indexReader

@Component
public class LuceneIndexTest {

   /**
    * 创建索引
    * @throws Exception
    */
   public void createIndex() throws  Exception{
      //1、指定索存放位置
      Directory directory= FSDirectory.open(Paths.get(new File("E:\\study\\Lucene_Index").getPath()));
      //2、创建一个自定义的分词器，因为默认的分词器是不支持中文词的
      //分词器的种类很多 SmartChineseAnalyzer：对中文支持相对好些、CJKAnalyzer：支持中韩分词、StandardAnalyzer：默认的分词器不支持中文词语等
      SmartChineseAnalyzer smartChineseAnalyzer = new SmartChineseAnalyzer();
      //3、创建indexWriterConfig(参数分词器)
      IndexWriterConfig indexWriterConfig=new IndexWriterConfig(smartChineseAnalyzer);
      //4、创建IndexWriter
      IndexWriter indexWriter=new IndexWriter(directory,indexWriterConfig);
      //原始文件路径
      File file = new File("E:\\study\\Lucene_Docment");
      for (File f: file.listFiles()){
         //文件名
         String fileName = f.getName();
         //文件内容
         String fileContent = FileUtils.readFileToString(f,"UTF-8");
         System.out.println(fileContent);
         //文件路径
         String path = f.getPath();
         //文件大小
         long fileSize = FileUtils.sizeOf(f);
         //创建文件域名
         //域的名称 域的内容 是否存储
         //域的类型有好几种
         // StringField 这个Field用来构建一个字符串Field，但是不会进行分析，会将整个串存储在索引中
         //TextField 富川富川
         Field fileNameField = new TextField("name", fileName, Field.Store.YES);
         Field fileContentField = new TextField("conttent", fileContent, Field.Store.YES);
         Field filePathField = new TextField("path", path, Field.Store.YES);
         Field fileSizeField = new TextField("size", fileSize+"", Field.Store.YES);
         //5、创建Document 对象
         Document document = new Document();
         document.add(fileNameField);
         document.add(fileContentField);
         document.add(filePathField);
         document.add(fileSizeField);
         //6、创建索引，并写入索引库
         indexWriter.addDocument(document);
      }

      //关闭indexWriter
      indexWriter.close();
      directory.close();

   }

   /**
    * 查询索引 根据关键词
    * @throws Exception
    */
   public void searchIndex() throws Exception{

      //1、指定索存放位置
      Directory directory= FSDirectory.open(Paths.get(new File("E:\\study\\Lucene_Index").getPath()));
      //2、创建indexReader对象
      IndexReader indexReader = DirectoryReader.open(directory);
      //3、创建indexSearcher对象
      IndexSearcher indexSearcher = new IndexSearcher(indexReader);
      //4、创建查询 参数一  查询对象    参数二  查询结果返回的最大值
      Query query = new TermQuery(new Term("conttent", "镜像"));
      //5、执行查询
      TopDocs topDocs = indexSearcher.search(query, 10);
      System.out.println("查询结果的总数"+topDocs.totalHits);
      //遍历查询结果
      for (ScoreDoc scoreDoc: topDocs.scoreDocs){
         Document doc = indexSearcher.doc(scoreDoc.doc);
         System.out.println(doc.getField("name"));
         System.out.println(doc.getField("path"));
      }
      //关闭indexReader
      indexReader.close();
   }

   /**
    * 查询索引 根据长句 其实就是先将长句进行分词然后再去匹配
    * 比如：执行创建镜像=  执行+创建+镜像
    * @throws Exception
    */
   public void searchIndexByTxt() throws Exception{

      //1、指定索存放位置
      Directory directory= FSDirectory.open(Paths.get(new File("E:\\study\\Lucene_Index").getPath()));
      //2、创建indexReader对象
      IndexReader indexReader = DirectoryReader.open(directory);
      //3、创建indexSearcher对象
      IndexSearcher indexSearcher = new IndexSearcher(indexReader);
      //4、创建分词器
      SmartChineseAnalyzer smartChineseAnalyzer = new SmartChineseAnalyzer();
      // 5、查询生成器（解析输入生成Query查询对象）
      QueryParser parser = new QueryParser("conttent", smartChineseAnalyzer);
      // 6、通过parse解析输入（分词），生成query对象
      Query query = parser.parse("中国制造");
      //7、执行查询
      TopDocs topDocs = indexSearcher.search(query, 10);
      System.out.println("查询结果的总数"+topDocs.totalHits);
      //遍历查询结果
      for (ScoreDoc scoreDoc: topDocs.scoreDocs){
         Document doc = indexSearcher.doc(scoreDoc.doc);
         System.out.println(doc.getField("name"));
         System.out.println(doc.getField("path"));
      }
      // 使用完毕，关闭、释放资源
      indexReader.close();
      directory.close();
   }
}

这样简单的Lucene全文检索就实现了，我这里是通过读取文件来创建索引库，其实大多数是读取数据库数据然后创建对应的索引库，通过以上代码我们很清晰的可以看出Lucene的基本原理：就是将文件进行分词处理，然后存储起来，查询的时候进行匹配，当符合对应的分词则返回结果。Lucene除了关键词查询以及长句查询其实还可以根据范围查询，比如size的最大值和最小值一个范围，但是实际项目中不建议用Lucene毕竟这只是一个api,可以用es。es可以很好的兼容分布式项目，而且本身其实类似于一个数据库，可以做集群等处理。

---------------------------写的不好仅供参考-----------------------------------------------------