《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据

最新推荐文章于 2024-10-24 02:53:53 发布

堇翳

最新推荐文章于 2024-10-24 02:53:53 发布

阅读量277

点赞数

⭐️基础链接导航⭐️

服务器 → ☁️ 阿里云活动地址

看样例 → 🐟 摸鱼小网站地址

学代码 → 💻 源码库地址

一、前言

我们已经成功实现了一个完整的热搜组件，从后端到前端，构建了这个小网站的核心功能。接下来，我们将不断完善其功能，使其更加美观和实用。今天的主题是如何定时获取热搜数据。如果热搜数据无法定时更新，小网站将失去其核心价值。之前，我采用了@Scheduled注解来实现定时任务，但这种方式灵活性不足，因此我决定用更灵活的XXL-Job组件来替代它。

二、xxl-job部署

xxl-job是一个轻量级分布式任务调度平台，其核心设计目标是开发迅速、学习简单、轻量级、易扩展。目前github代码库star 27.3k，开源免费的，值得学习使用一下。

1. 代码库下载

github代码库地址

下载下来后，代码库结构如下：
《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_spring
源码结构如下：

xxl-job-admin：调度中心
xxl-job-core：公共依赖
xxl-job-executor-samples：执行器Sample示例（选择合适的版本执行器，可直接使用，也可以参考其并将现有项目改造成执行器）
    ：xxl-job-executor-sample-springboot：Springboot版本，通过Springboot管理执行器，推荐这种方式；
    ：xxl-job-executor-sample-frameless：无框架版本；

调度中心配置内容说明：

### 调度中心JDBC链接：链接地址请保持和 2.1章节 所创建的调度数据库的地址一致
spring.datasource.url=jdbc:mysql://127.0.0.1:3306/xxl_job?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&serverTimezone=Asia/Shanghai
spring.datasource.username=xxx
spring.datasource.password=xxx
spring.datasource.driver-class-name=com.mysql.jdbc.Driver

### 报警邮箱
spring.mail.host=smtp.qq.com
spring.mail.port=25
spring.mail.username=xxx@qq.com
spring.mail.password=xxx
spring.mail.properties.mail.smtp.auth=true
spring.mail.properties.mail.smtp.starttls.enable=true
spring.mail.properties.mail.smtp.starttls.required=true
spring.mail.properties.mail.smtp.socketFactory.class=javax.net.ssl.SSLSocketFactory

### 调度中心通讯TOKEN [选填]：非空时启用；
xxl.job.accessToken=

### 调度中心国际化配置 [必填]： 默认为 "zh_CN"/中文简体, 可选范围为 "zh_CN"/中文简体, "zh_TC"/中文繁体 and "en"/英文；
xxl.job.i18n=zh_CN

## 调度线程池最大线程配置【必填】
xxl.job.triggerpool.fast.max=200
xxl.job.triggerpool.slow.max=100

### 调度中心日志表数据保存天数 [必填]：过期日志自动清理；限制大于等于7时生效，否则, 如-1，关闭自动清理功能；
xxl.job.logretentiondays=30

2. 表结构初始化

在doc目录的 db目录下，有一个sql文件，里面有一些表和数据的初始化sql，我们要在执行XXL-Job之前要把表和数据准备好。

执行结束后，表如下：
《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_java_02

3. 启动XXL-Job

找到XxlJobAdminApplication，启动该应用，在浏览器输入： http://localhost:12000/xxl-job-admin/toLogin，会进入XXL-Job登录界面，如下：

《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_爬虫_03

输入用户名：admin；密码：123456点击登录进入主界面，如下：

《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_java_04

三、自定义爬虫任务

XXL-Job的使用也很简单，一个注解就好了，这里我说一下如何使用它。

1. 引入XXL-Job依赖

在summo-sbmy-job的pom.xml下添加：

<!-- xxl-job -->
<dependency>
  <groupId>com.xuxueli</groupId>
  <artifactId>xxl-job-core</artifactId>
  <version>2.4.1</version>
</dependency>

2. XXL-Job配置

在application.preoperties文件中加入XXL-Job的配置，配置如下：

# xxl-job
xxl.job.open=true
### xxl-job admin address list, such as "http://address" or "http://address01,http://address02"
xxl.job.admin.addresses=http://127.0.0.1:12000/xxl-job-admin
### xxl-job, access token
xxl.job.accessToken=default_token
### xxl-job executor appname
xxl.job.executor.appname=summo-sbmy
### xxl-job executor log-path
xxl.job.executor.logpath=/root/logs/xxl-job/jobhandler
### xxl-job executor log-retention-days
xxl.job.executor.logretentiondays=30
### xxl-job executor registry-address: default use address to registry , otherwise use ip:port if address is null
xxl.job.executor.address=
### xxl-job executor server-info
xxl.job.executor.ip=
xxl.job.executor.port=9999

配置弄好之后，在com.summo.sbmy.job.config目录下创建一个config文件，创建XxlJobConfig.java，代码如下：

package com.summo.sbmy.job.config;

import com.xxl.job.core.executor.impl.XxlJobSpringExecutor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * xxl-job config
 *
 * @author xuxueli 2017-04-28
 */
@Configuration
public class XxlJobConfig {
    private Logger logger = LoggerFactory.getLogger(XxlJobConfig.class);

    @Value("${xxl.job.admin.addresses}")
    private String adminAddresses;

    @Value("${xxl.job.accessToken}")
    private String accessToken;

    @Value("${xxl.job.executor.appname}")
    private String appname;

    @Value("${xxl.job.executor.address}")
    private String address;

    @Value("${xxl.job.executor.ip}")
    private String ip;

    @Value("${xxl.job.executor.port}")
    private int port;

    @Value("${xxl.job.executor.logpath}")
    private String logPath;

    @Value("${xxl.job.executor.logretentiondays}")
    private int logRetentionDays;

    @Bean
    @ConditionalOnProperty(name = "xxl.job.open", havingValue = "true")
    public XxlJobSpringExecutor xxlJobExecutor() {
        logger.info(">>>>>>>>>>> xxl-job config init.");
        XxlJobSpringExecutor xxlJobSpringExecutor = new XxlJobSpringExecutor();
        xxlJobSpringExecutor.setAdminAddresses(adminAddresses);
        xxlJobSpringExecutor.setAppname(appname);
        xxlJobSpringExecutor.setAddress(address);
        xxlJobSpringExecutor.setIp(ip);
        xxlJobSpringExecutor.setPort(port);
        xxlJobSpringExecutor.setAccessToken(accessToken);
        xxlJobSpringExecutor.setLogPath(logPath);
        xxlJobSpringExecutor.setLogRetentionDays(logRetentionDays);

        return xxlJobSpringExecutor;
    }
}

配置和类都弄好之后，重新启动应用，如果顺利的话，在XXL-Job管理的执行器界面上就可以看到一个执行器已经注册了，如下：
《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_服务器_05

4. 注册XXL-Job任务

以抖音热搜为例，我们最开始使用的是 @Scheduled注解，代码如下：

/**
  * 定时触发爬虫方法，1个小时执行一次
  */
@Scheduled(fixedRate = 1000 * 60 * 60)
public void hotSearch() throws IOException{
  ... ...
}

将@Scheduled注解替换为@XxlJob("douyinHotSearchJob")，具体的代码如下：

package com.summo.sbmy.job.douyin;

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.google.common.collect.Lists;
import com.summo.sbmy.dao.entity.SbmyHotSearchDO;
import com.summo.sbmy.service.SbmyHotSearchService;
import com.summo.sbmy.service.convert.HotSearchConvert;
import com.xxl.job.core.biz.model.ReturnT;
import com.xxl.job.core.handler.annotation.XxlJob;
import lombok.extern.slf4j.Slf4j;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import org.apache.commons.collections4.CollectionUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.util.List;
import java.util.Random;
import java.util.UUID;
import java.util.stream.Collectors;

import static com.summo.sbmy.common.cache.SbmyHotSearchCache.CACHE_MAP;
import static com.summo.sbmy.common.enums.HotSearchEnum.DOUYIN;

/**
 * @author summo
 * @version DouyinHotSearchJob.java, 1.0.0
 * @description 抖音热搜Java爬虫代码
 * @date 2024年08月09
 */
@Component
@Slf4j
public class DouyinHotSearchJob {

    @Autowired
    private SbmyHotSearchService sbmyHotSearchService;

    @XxlJob("douyinHotSearchJob")
    public ReturnT<String> hotSearch(String param) throws IOException {
        log.info("抖音热搜爬虫任务开始");
        try {
            //查询抖音热搜数据
            OkHttpClient client = new OkHttpClient().newBuilder().build();
            Request request = new Request.Builder().url("https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/word/").method("GET", null).build();
            Response response = client.newCall(request).execute();
            JSONObject jsonObject = JSONObject.parseObject(response.body().string());
            JSONArray array = jsonObject.getJSONArray("word_list");
            List<SbmyHotSearchDO> sbmyHotSearchDOList = Lists.newArrayList();
            for (int i = 0, len = array.size(); i < len; i++) {
                //获取知乎热搜信息
                JSONObject object = (JSONObject) array.get(i);
                //构建热搜信息榜
                SbmyHotSearchDO sbmyHotSearchDO = SbmyHotSearchDO.builder().hotSearchResource(DOUYIN.getCode()).build();
                //设置文章标题
                sbmyHotSearchDO.setHotSearchTitle(object.getString("word"));
                //设置知乎三方ID
                sbmyHotSearchDO.setHotSearchId(getHashId(DOUYIN.getCode() + sbmyHotSearchDO.getHotSearchTitle()));
                //设置文章连接
                sbmyHotSearchDO.setHotSearchUrl("https://www.douyin.com/search/" + sbmyHotSearchDO.getHotSearchTitle() + "?type=general");
                //设置热搜热度
                sbmyHotSearchDO.setHotSearchHeat(object.getString("hot_value"));
                //按顺序排名
                sbmyHotSearchDO.setHotSearchOrder(i + 1);
                sbmyHotSearchDOList.add(sbmyHotSearchDO);
            }
            if (CollectionUtils.isEmpty(sbmyHotSearchDOList)) {
                return ReturnT.SUCCESS;
            }
            //数据加到缓存中
            CACHE_MAP.put(DOUYIN.getCode(), sbmyHotSearchDOList.stream().map(HotSearchConvert::toDTOWhenQuery).collect(Collectors.toList()));

            //数据持久化
            sbmyHotSearchService.saveCache2DB(sbmyHotSearchDOList);
            log.info("抖音热搜爬虫任务结束");
        } catch (IOException e) {
            log.error("获取抖音数据异常", e);
        }
        return ReturnT.SUCCESS;
    }

    /**
     * 根据文章标题获取一个唯一ID
     *
     * @param title 文章标题
     * @return 唯一ID
     */
    private String getHashId(String title) {
        long seed = title.hashCode();
        Random rnd = new Random(seed);
        return new UUID(rnd.nextLong(), rnd.nextLong()).toString();
    }

}

在XXL-Job管理台的任务管理界面中点击新增任务，如下：
《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_spring_06

创建好任务后，我们可以手动运行一次，如下：
《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_spring_07

《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_服务器_08

这样抖音的热搜任务我们就配置好了，其他的爬虫任务也是这样的配置。

四、热搜更新时间

目前我们已经实现了三个热搜组件，百度、抖音、知乎，但是我们并不知道这些热搜是什么时候更新的，也不知道是不是实时的，所以我们需要把热搜更新时间放出来，大概下面这样子：

《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_spring_09
优化后组件代码如下：

<template>
  <el-card class="custom-card" v-loading="loading">
    <template #header>
      <div class="card-title">
        <img :src="icon" class="card-title-icon" />
        {{ title }}热榜
        <span class="update-time">{{ formattedUpdateTime }}</span>
      </div>
    </template>
    <div class="cell-group-scrollable">
      <div
        v-for="item in hotSearchData"
        :key="item.hotSearchOrder"
        :class="getRankingClass(item.hotSearchOrder)"
        class="cell-wrapper"
      >
        <span class="cell-order">{{ item.hotSearchOrder }}</span>
        <span
          class="cell-title hover-effect"
          @click="openLink(item.hotSearchUrl)"
        >
          {{ item.hotSearchTitle }}
        </span>
        <span class="cell-heat">{{ formatHeat(item.hotSearchHeat) }}</span>
      </div>
    </div>
  </el-card>
</template>

<script>
import apiService from "@/config/apiService.js";

export default {
  props: {
    title: String,
    icon: String,
    type: String,
  },
  data() {
    return {
      hotSearchData: [],
      updateTime: null,
      loading: false,
    };
  },
  created() {
    this.fetchData(this.type);
  },
  computed: {
    formattedUpdateTime() {
      if (!this.updateTime) return '';

      const updateDate = new Date(this.updateTime);
      const now = new Date();
      
      const timeDiff = now - updateDate;
      const minutesDiff = Math.floor(timeDiff / 1000 / 60);

      if (minutesDiff < 1) {
        return '刚刚更新';
      } else if (minutesDiff < 60) {
        return `${minutesDiff}分钟前更新`;
      } else if (minutesDiff < 1440) {
        return `${Math.floor(minutesDiff / 60)}小时前更新`;
      } else {
        return updateDate.toLocaleString();
      }
    },
  },
  methods: {
    fetchData(type) {
      this.loading = true;
      apiService
        .get("/hotSearch/queryByType?type=" + type)
        .then((res) => {
          this.hotSearchData = res.data.data.hotSearchDTOList;
          this.updateTime = res.data.data.updateTime;
        })
        .catch((error) => {
          console.error(error);
        })
        .finally(() => {
          this.loading = false; 
        });
    },
    getRankingClass(order) {
      if (order === 1) return "top-ranking-1";
      if (order === 2) return "top-ranking-2";
      if (order === 3) return "top-ranking-3";
      return "";
    },
    formatHeat(heat) {
      if (typeof heat === "string" && heat.endsWith("万")) {
        return heat;
      }
      let number = parseFloat(heat);
      if (isNaN(number)) {
        return heat;
      }
      if (number < 1000) {
        return number.toString();
      }
      if (number >= 1000 && number < 10000) {
        return (number / 1000).toFixed(1) + "k";
      }
      if (number >= 10000) {
        return (number / 10000).toFixed(1) + "万";
      }
    },
    openLink(url) {
      if (url) {
        window.open(url, "_blank");
      }
    },
  },
};
</script>

<style scoped>
.custom-card {
  background-color: #ffffff;
  border-radius: 10px;
  box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
  margin-bottom: 20px;
}
.custom-card:hover {
  box-shadow: 0 6px 8px rgba(0, 0, 0, 0.25);
}
.el-card__header {
  padding: 10px 18px;
  display: flex;
  justify-content: space-between; /* Added to space out title and update time */
  align-items: center;
}
.card-title {
  display: flex;
  align-items: center;
  font-weight: bold;
  font-size: 16px;
  flex-grow: 1;
}
.card-title-icon {
  fill: currentColor;
  width: 24px;
  height: 24px;
  margin-right: 8px;
}
.update-time {
  font-size: 12px;
  color: #b7b3b3;
  margin-left: auto; /* Ensures it is pushed to the far right */
}
.cell-group-scrollable {
  max-height: 350px;
  overflow-y: auto;
  padding-right: 16px; 
  flex: 1;
}
.cell-wrapper {
  display: flex;
  align-items: center;
  padding: 8px 8px; 
  border-bottom: 1px solid #e8e8e8; 
}
.cell-order {
  width: 20px;
  text-align: left;
  font-size: 16px;
  font-weight: 700;
  margin-right: 8px;
  color: #7a7a7a; 
}
.cell-heat {
  min-width: 50px;
  text-align: right;
  font-size: 12px;
  color: #7a7a7a;
}
.cell-title {
  font-size: 13px;
  color: #495060;
  line-height: 22px;
  flex-grow: 1;
  overflow: hidden;
  text-align: left; 
  text-overflow: ellipsis; 
}
.top-ranking-1 .cell-order {
  color: #fadb14; /* 金色 */
}
.top-ranking-2 .cell-order {
  color: #a9a9a9; /* 银色 */
}
.top-ranking-3 .cell-order {
  color: #d48806; /* 铜色 */
}
.cell-title.hover-effect {
  cursor: pointer; 
  transition: color 0.3s ease; 
}
.cell-title.hover-effect:hover {
  color: #409eff; 
}
</style>

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
162.
163.
164.
165.
166.
167.
168.
169.
170.
171.
172.
173.
174.
175.
176.
177.
178.
179.
180.
181.
182.
183.
184.
185.
186.
187.
188.
189.
190.
191.
192.
193.
194.
195.
196.
197.
198.
199.
200.
201.
202.
203.
204.
205.

优化后，我们看一下最终的样式，如下：
《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_爬虫_10

这样，我们使用XXL-Job改造热搜组件就完成了，详细代码可以去看我的代码仓库。

番外：B站热搜爬虫

1. 爬虫方案评估

B站不是热搜，是热门视频，但逻辑是一样的，它的接口是： https://api.bilibili.com/x/web-interface/ranking/v2

《花100块做个摸鱼小网站! 》第五篇—通过xxl-job定时获取热搜数据_服务器_11

这个接口返回的是JSON格式数据，这就很简单了，看下结构就行。

2. 网页解析代码

这个就可以使用Postman生成调用代码，流程我就不赘述了，直接上代码，BilibiliHotSearchJob：

package com.summo.sbmy.job.bilibili;

import java.io.IOException;
import java.util.Calendar;
import java.util.List;
import java.util.stream.Collectors;

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;

import com.google.common.collect.Lists;
import com.summo.sbmy.common.model.dto.HotSearchDetailDTO;
import com.summo.sbmy.dao.entity.SbmyHotSearchDO;
import com.summo.sbmy.service.SbmyHotSearchService;
import com.summo.sbmy.service.convert.HotSearchConvert;
import com.xxl.job.core.biz.model.ReturnT;
import com.xxl.job.core.handler.annotation.XxlJob;
import lombok.extern.slf4j.Slf4j;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import org.apache.commons.collections4.CollectionUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import static com.summo.sbmy.common.cache.SbmyHotSearchCache.CACHE_MAP;
import static com.summo.sbmy.common.enums.HotSearchEnum.BILIBILI;

/**
 * @author summo
 * @version BilibiliHotSearchJob.java, 1.0.0
 * @description B站热榜Java爬虫代码
 * @date 2024年08月19
 */
@Component
@Slf4j
public class BilibiliHotSearchJob {

    @Autowired
    private SbmyHotSearchService sbmyHotSearchService;

    @XxlJob("bilibiliHotSearchJob")
    public ReturnT<String> hotSearch(String param) throws IOException {
        log.info("B站热搜爬虫任务开始");
        try {
            //查询B站热搜数据
            OkHttpClient client = new OkHttpClient().newBuilder().build();
            Request request = new Request.Builder().url("https://api.bilibili.com/x/web-interface/ranking/v2")
                .addHeader("User-Agent", "Mozilla/5.0 (compatible)").addHeader("Cookie", "b_nut=1712137652; "
                    + "buvid3=DBA9C433-8738-DD67-DCF5" + "-DDC780CA892052512infoc").method("GET", null).build();
            Response response = client.newCall(request).execute();
            JSONObject jsonObject = JSONObject.parseObject(response.body().string());
            JSONArray array = jsonObject.getJSONObject("data").getJSONArray("list");
            List<SbmyHotSearchDO> sbmyHotSearchDOList = Lists.newArrayList();
            for (int i = 0, len = array.size(); i < len; i++) {
                //获取B站热搜信息
                JSONObject object = (JSONObject)array.get(i);
                //构建热搜信息榜
                SbmyHotSearchDO sbmyHotSearchDO = SbmyHotSearchDO.builder().hotSearchResource(BILIBILI.getCode())
                    .build();
                //设置B站三方ID
                sbmyHotSearchDO.setHotSearchId(object.getString("aid"));
                //设置文章连接
                sbmyHotSearchDO.setHotSearchUrl(object.getString("short_link_v2"));
                //设置文章标题
                sbmyHotSearchDO.setHotSearchTitle(object.getString("title"));
                //设置作者名称
                sbmyHotSearchDO.setHotSearchAuthor(object.getJSONObject("owner").getString("name"));
                //设置作者头像
                sbmyHotSearchDO.setHotSearchAuthorAvatar(object.getJSONObject("owner").getString("face"));
                //设置文章封面
                sbmyHotSearchDO.setHotSearchCover(object.getString("pic"));
                //设置热搜热度
                sbmyHotSearchDO.setHotSearchHeat(object.getJSONObject("stat").getString("view"));
                //按顺序排名
                sbmyHotSearchDO.setHotSearchOrder(i + 1);
                sbmyHotSearchDOList.add(sbmyHotSearchDO);
            }
            if (CollectionUtils.isEmpty(sbmyHotSearchDOList)) {
                return ReturnT.SUCCESS;
            }
            //数据加到缓存中
            CACHE_MAP.put(BILIBILI.getCode(), HotSearchDetailDTO.builder()
                //热搜数据
                .hotSearchDTOList(
                    sbmyHotSearchDOList.stream().map(HotSearchConvert::toDTOWhenQuery).collect(Collectors.toList()))
                //更新时间
                .updateTime(Calendar.getInstance().getTime()).build());
            //数据持久化
            sbmyHotSearchService.saveCache2DB(sbmyHotSearchDOList);
            log.info("B站热搜爬虫任务结束");
        } catch (IOException e) {
            log.error("获取B站数据异常", e);
        }
        return ReturnT.SUCCESS;
    }

}