Java零基础——Elasticsearch篇

最新推荐文章于 2024-08-17 13:00:00 发布

动力节点IT教育

最新推荐文章于 2024-08-17 13:00:00 发布

阅读量1.6k

点赞数 29

分类专栏： Java零基础教学文档文章标签： java elasticsearch 开发语言

本文链接：https://blog.csdn.net/m0_47946173/article/details/134870946

版权

本文是关于Elasticsearch的入门教程，详细介绍了Elasticsearch的基础知识，包括其与Lucene的关系、与Solr的对比、倒排索引的概念、分词原理，以及如何安装和配置Elasticsearch。文章还强调了Elasticsearch的核心概念，如文档、索引、节点和集群，并提供了SpringBoot集成ES的基本使用方法和复杂查询操作的概述。

摘要由CSDN通过智能技术生成

1.Elasticsearch简介

Elasticsearch是一个基于Lucene的一个开源的分布式、RESTful 风格的搜索和数据分析引擎。Elasticsearch是用Java语言开发的，并作为Apache许可条款下的开放源码发布，是一种流行的企业级搜索引擎。Elasticsearch用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。官方客户端在Java、.NET（C#）、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示，Elasticsearch是最受欢迎的企业搜索引擎，其次是Apache Solr，也是基于Lucene。

2.Lucene 核心库（红黑二叉树）

Lucene 可以说是当下最先进、高性能、全功能的搜索引擎库——无论是开源还是私有，但它也仅仅只是一个库。为了充分发挥其功能，你需要使用 Java 并将 Lucene 直接集成到应用程序中。更糟糕的是，您可能需要获得信息检索学位才能了解其工作原理，因为Lucene 非常复杂。

为了解决Lucene使用时的繁复性，于是Elasticsearch便应运而生。它使用 Java 编写，内部采用 Lucene 做索引与搜索，但是它的目标是使全文检索变得更简单，简单来说，就是对Lucene 做了一层封装，它提供了一套简单一致的 RESTful API 来帮助我们实现存储和检索。
在这里插入图片描述

3.和solr对比

在这里插入图片描述

ElasticSearch 对比 Solr 总结

es基本是开箱即用，非常简单。Solr安装略微复杂一丢丢
Solr 利用 Zookeeper 进行分布式管理，而 Elasticsearch 自身带有分布式协调管理功能。
Solr 支持更多格式的数据，比如JSON、XML、CSV，而 Elasticsearch 仅支持json文件格式。
Solr 官方提供的功能更多，而 Elasticsearch 本身更注重于核心功能，高级功能多有第三方插件提供，例如图形化界面需要kibana，head等友好支撑，分词插件
Solr 查询快，但更新索引时慢（即插入删除慢），用于电商等查询多的应用（之前）；

ES建立索引快（即查询慢），即实时性查询快，用于推特新浪等搜索。

Solr 是传统搜索应用的有力解决方案，但 Elasticsearch 更适用于新兴的实时搜索应用。

Solr比较成熟，有一个更大，更成熟的用户、开发和贡献者社区，而 Elasticsearch相对开发维护者较少，更新太快，学习使用成本较高。（现在es也比较火）

4.倒排索引（重点）

正排索引根据id 找到对应的一组数据（B+tree 聚簇索引）

非聚簇索引：给一个字段建立索引，查询的时候根据这个字段查到这行数据对应的id

回表再根据id 去查聚簇索引从而拿到一行数据

4.1 正排索引
在这里插入图片描述

4.2 倒排索引
在这里插入图片描述

一个倒排索引由文档中所有不重复词的列表构成，对于其中每个词，有一个包含它的 Term 列表。

5.分词

就是按照一定的规则，将一句话分成组合的单词，按照国人喜欢来进行的

海上生明月 - 如何分成 ----->海上 | 生 | 明月

我想要个女朋友 ---- > 我|想要|要个|女朋友|朋友

6.模拟一个倒排索引

原理步骤：

将数据存入mysql之前，对其进行分词
讲分词和存入后得到的id，存放在数据结构中Map<String,Set> index
查询时先分词，然后从index中拿到Set ids
再根据ids 查询mysql，从而得到结果，这样借助了mysql的B+tree索引，提高性能

6.1 创建boot项目选择依赖
在这里插入图片描述

6.2 引入分词的依赖

<dependency>
    <groupId>com.huaban</groupId>
    <artifactId>jieba-analysis</artifactId>
    <version>1.0.2</version>
</dependency>

6.3 修改启动类，注入结巴分词器

/**
 * 往IOC容器中入住结巴分词组件
 *
 * @return
 */
@Bean
public JiebaSegmenter jiebaSegmenter() {
   
    return new JiebaSegmenter();
}

6.4 测试分词

@Autowired
public JiebaSegmenter jiebaSegmenter;

@Test
void testJieBa() {
   
    String words = "华为 HUAWEI P40 Pro 麒麟990 5G SoC芯片 5000万超感知徕卡四摄 50倍数字变焦 8GB+256GB零度白全网通5G手机";
    // 使用结巴分词，对字符串进行分词，分词类型为搜索类型
    List<SegToken> tokens = jiebaSegmenter.process(words, JiebaSegmenter.SegMode.INDEX);
    // 遍历，拿到SegToken对象中的word属性，打印结果
    tokens.stream()
            .map(token -> token.word)
            .collect(Collectors.toList())
            .forEach(System.out::println);
}

6.5 使用商品搜索案例来展示倒排索引结构

6.5.1 新建Goods类

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Goods {
   
    /**
     * 商品的id
     */
    private Integer goodsId;

    /**
     * 商品的名称（主要分词和检索字段）
     */
    private String goodsName;

    /**
     * 商品的价格
     */
    private Double goodsPrice;

}

6.5.2 模拟数据库，新建DBUtil类

public class DBUtil {
   
    /**
     * 模拟数据库，key=id，value=商品对象
     * 这里也可以使用List来模拟
     */
    public static Map<Integer, Goods> db = new HashMap<>();

    /**
     * 插入数据库
     *
     * @param goods
     */
    public static void insert(Goods goods) {
   
        db.put(goods.getGoodsId(), goods);
    }


    /**
     * 根据id得到商品
     *
     * @param id
     * @return
     */
    public static Goods getGoodsById(Integer id) {
   
        return db.get(id);
    }

    /**
     * 根据ids查询商品集合
     *
     * @param ids
     * @return
     */
    public static List<Goods> getGoodsByIds(Set<Integer> ids) {
   
        if (CollectionUtils.isEmpty(ids)) {
   
            return Collections.emptyList();
        }
        List<Goods> goods = new ArrayList<>(ids.size() * 2);
        // 循环ids
        ids.forEach(id -> {
   
            // 从数据库拿到数据
            Goods g = db.get(id);
            if (!ObjectUtils.isEmpty(g)) {
   
                goods.add(g);
            }
        });
        return goods;
    }
}

6.5.3 创建倒排索引的数据结构

public class InvertedIndex {
   
    /**
     * 倒排索引 key = 分词，value= ids
     */
    public static Map<String, Set<Integer>> index = new HashMap<>();

}

6.5.4 创建GoodsService接口

public interface GoodsService {
   

    /**
     * 添加商品的方法
     *
     * @param goods
     */
    void addGoods(Goods goods);

    /**
     * 根据商品名称查询
     *
     * @param name
     * @return
     */
    List<Goods> findGoodsByName(String name);

    /**
     * 根据关键字查询
     *
     * @param keywords
     * @return
     */
    List<Goods> findGoodsByKeywords(String keywords);

}

6.5.5 创建GoodsServiceImpl实现类

@Service
public class GoodsServiceImpl implements GoodsService {
   

    @Autowired
    private JiebaSegmenter jiebaSegmenter;

    /**
     * 添加商品的方法
     * 1.先对商品名称进行分词，拿到了List<String> tokens
     * 2.将商品插入数据库 拿到商品id
     * 3.将tokens和id放入倒排索引中index
     *
     * @param goods
     */
    @Override
    public void addGoods(Goods goods) {
   
        // 分词
        List<String> keywords = fenci(goods.getGoodsName());
        // 插入数据库
        DBUtil.insert(goods);
        // 保存到倒排索引中
        saveToInvertedIndex(keywords, goods.getGoodsId());
    }

    /**
     * 保存到倒排索引的方法
     *
     * @param keywords
     * @param goodsId
     */
    private void saveToInvertedIndex(List<String> keywords, Integer goodsId) {
   
        // 拿到索引
        Map<String, Set<Integer>> index = InvertedIndex.index;
        // 循环分词集合
        keywords.forEach(keyword -> {
   
            Set<Integer> ids = index.get(keyword);
            if (CollectionUtils.isEmpty(ids)) {
   
                // 如果之前没有这个词 就添加进去
                HashSet<Integer> newIds = new HashSet<>(2);
                newIds.add(goodsId)