SpringBoot整合ES搜索引擎实现网站热搜词及热度计算

月下独码

已于 2024-09-15 09:28:42 修改

阅读量1.6k

点赞数 48

分类专栏： springboot Elasticsearch 文章标签： spring boot elasticsearch 后端热搜词热度计算 java

于 2024-09-15 06:30:00 首次发布

本文链接：https://blog.csdn.net/lilinhai548/article/details/142268241

版权

springboot 同时被 2 个专栏收录

22 篇文章 2 订阅

订阅专栏

Elasticsearch

7 篇文章 0 订阅

订阅专栏

🧑 博主简介：历代文学网（PC端可以访问：https://literature.sinhy.com/#/literature?__c=1000，移动端可微信小程序搜索“历代文学”）总架构师，15年工作经验，精通Java编程，高并发设计，Springboot和微服务，熟悉Linux，ESXI虚拟化以及云原生Docker和K8s，热衷于探索科技的边界，并将理论知识转化为实际应用。保持对新技术的好奇心，乐于分享所学，希望通过我的实践经历和见解，启发他人的创新思维。在这里，我希望能与志同道合的朋友交流探讨，共同进步，一起在技术的世界里不断学习成长。

在这里插入图片描述

Spring Boot 整合 Elasticsearch 实现网站热搜词及热度计算

热搜词是在一定时间范围内，用户搜索频率较高的词汇。在网站中，通过统计用户搜索行为来确定哪些词是热门搜索词。

在现代网站中，热搜词功能是提升用户体验和搜索效率的重要组成部分。通过实时分析用户的搜索行为，我们可以为用户提供最热门的搜索关键词，并计算每个关键词的热度，以数值形式展示。本文将详细介绍如何使用 Spring Boot 整合 Elasticsearch（ES）来实现网站的热搜词功能，并计算每个热搜词的热度。

1. 热搜词原理

热搜词的计算通常基于以下几个指标：

搜索频率：某个搜索查询在一定时间内的出现次数。
点击率：用户点击某个搜索查询结果的比例。
时间衰减：最近的热搜词权重更高，较早的热搜词权重逐渐衰减。

Elasticsearch（ES）在热搜中的作用
ES 是一个分布式搜索和分析引擎，非常适合用于处理大量的文本数据搜索。它具有快速的搜索速度、分布式架构、可扩展性等优点。在热搜功能中，我们可以将用户的搜索词存储到 ES 中，通过聚合（aggregation）功能来统计每个搜索词出现的次数，进而计算出热度。

2. 设计思路

2.1 数据收集与预处理

搜索日志：记录用户的搜索查询。
点击日志：记录用户点击的搜索结果。
预处理：分词、去重、归一化处理。

2.2 存储与索引

将预处理后的搜索查询数据存储到 Elasticsearch 中，并创建相应的索引。

2.3 计算热搜词及热度

使用 Elasticsearch 的聚合功能来计算热搜词，按照搜索词进行分组（terms aggregation），统计每个搜索词出现的次数，并结合时间衰减函数调整热搜词的权重，计算每个热搜词的热度。

2.4 缓存与更新

为了提高性能，将热搜词及热度结果缓存起来，并定期更新。

3. 实现步骤

3.1 环境准备

Spring Boot：用于快速构建 Web 应用。
Elasticsearch：用于存储和查询搜索日志。
Redis：用于缓存热搜词及热度结果。

3.2 创建 Spring Boot 项目

使用 Spring Initializr 创建一个新的 Spring Boot 项目，并添加以下依赖：

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-quartz</artifactId>
    </dependency>
</dependencies>

3.3 配置 Elasticsearch 和 Redis

在 application.properties 中配置 Elasticsearch 和 Redis：

spring.elasticsearch.rest.uris=http://localhost:9200
spring.redis.host=localhost
spring.redis.port=6379

3.4 创建索引映射

创建一个用于存储搜索日志的索引映射：

PUT /search_logs
{
  "mappings": {
    "properties": {
      "query": {
        "type": "text"
      },
      "timestamp": {
        "type": "date"
      },
      "user_id": {
        "type": "keyword"
      },
      "click_count": {
        "type": "integer"
      }
    }
  }
}

3.5 数据收集与预处理

创建一个 SearchLog 实体类来表示搜索日志：

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;

import java.util.Date;

@Document(indexName = "search_logs") // 指定索引名称
public class SearchLog {
    @Id // 指定主键
    private String id;

    @Field(type = FieldType.Text) // 指定字段类型为文本
    private String query;

    @Field(type = FieldType.Date) // 指定字段类型为日期
    private Date timestamp;

    @Field(type = FieldType.Keyword) // 指定字段类型为关键字
    private String userId;

    @Field(type = FieldType.Integer) // 指定字段类型为整数
    private int clickCount;

    // Getters and Setters
}

创建一个 SearchLogRepository 接口来操作 Elasticsearch：

import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

public interface SearchLogRepository extends ElasticsearchRepository<SearchLog, String> {
    // 继承 ElasticsearchRepository，提供基本的 CRUD 操作
}

3.6 计算热搜词及热度

创建一个服务类 HotSearchService 来计算热搜词及热度：

import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.aggregations.metrics.Sum;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class HotSearchService {

    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;

    public List<HotSearchResult> getHotSearches() {
        NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchAllQuery()) // 匹配所有文档
                .addAggregation(AggregationBuilders.terms("hot_searches") // 添加聚合，计算热搜词
                        .field("query.keyword") // 聚合字段为 query 的关键字形式
                        .size(10) // 返回前 10 个热搜词
                        .order(Terms.Order.count(false)) // 按搜索次数降序排列
                        .subAggregation(AggregationBuilders.sum("click_sum") // 子聚合，计算点击总数
                                .field("click_count"))) // 聚合字段为 click_count
                .withPageable(PageRequest.of(0, 10)); // 分页查询，每页 10 条记录

        SearchHits<SearchLog> searchHits = elasticsearchTemplate.search(queryBuilder.build(), SearchLog.class);
        Terms hotSearches = searchHits.getAggregations().get("hot_searches");

        return hotSearches.getBuckets().stream()
                .map(bucket -> {
                    String query = bucket.getKeyAsString(); // 获取热搜词
                    long count = bucket.getDocCount(); // 获取搜索次数
                    Sum clickSum = bucket.getAggregations().get("click_sum"); // 获取点击总数
                    double clickCount = clickSum.getValue(); // 获取点击总数
                    double score = count + clickCount; // 热度计算公式
                    return new HotSearchResult(query, score); // 返回热搜词及热度
                })
                .collect(Collectors.toList());
    }
}

class HotSearchResult {
    private String query;
    private double score;

    public HotSearchResult(String query, double score) {
        this.query = query;
        this.score = score;
    }

    // Getters and Setters
}

3.7 缓存与更新

使用 Redis 缓存热搜词及热度结果，并定期更新：

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

import java.util.List;
import java.util.stream.Collectors;

@Component
public class HotSearchCacheUpdater {

    @Autowired
    private HotSearchService hotSearchService;

    @Autowired
    private StringRedisTemplate redisTemplate;

    @Scheduled(fixedRate = 60000) // 每分钟更新一次
    public void updateHotSearches() {
        List<HotSearchResult> hotSearches = hotSearchService.getHotSearches();
        String hotSearchesJson = hotSearches.stream()
                .map(result -> result.getQuery() + ":" + result.getScore()) // 将热搜词及热度转换为字符串
                .collect(Collectors.joining(",")); // 用逗号分隔
        redisTemplate.opsForValue().set("hot_searches", hotSearchesJson); // 将结果存入 Redis
    }
}

3.8 提供热搜词及热度接口

创建一个控制器 HotSearchController 来提供热搜词及热度接口：

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

@RestController
public class HotSearchController {

    @Autowired
    private StringRedisTemplate redisTemplate;

    @GetMapping("/hot-searches")
    public List<HotSearchResult> getHotSearches() {
        String hotSearchesJson = redisTemplate.opsForValue().get("hot_searches"); // 从 Redis 获取热搜词及热度
        return Arrays.stream(hotSearchesJson.split(",")) // 按逗号分割
                .map(entry -> {
                    String[] parts = entry.split(":"); // 按冒号分割
                    return new HotSearchResult(parts[0], Double.parseDouble(parts[1])); // 返回热搜词及热度
                })
                .collect(Collectors.toList());
    }
}