基于Spring Boot+jsoup+redis抓取CSDN每周干货的RESTFul爬虫

最新推荐文章于 2025-05-01 23:34:40 发布

anxpp

最新推荐文章于 2025-05-01 23:34:40 发布

阅读量7k

点赞数 4

CC 4.0 BY-SA版权

分类专栏：编程语言—————Java——— 文章标签： SpringBoot Redis Jsoup 爬虫

本文链接：https://blog.csdn.net/anxpp/article/details/61945975

一个简单的爬虫，用于抓取csdn上的每周干货推荐。

使用到的相关技术：SpringBoot、Redis、Jsoup、JQuery、Bootstrap等。

示例地址：

http://tinyspider.anxpp.com/

效果图：

1、写在前面

准备熟悉下Spring Boot + Redis的使用，所以就想到爬点东西出来，于是用上了号称Java版JQuery的Jsoup，实现的功能是获取每周的CSDN推荐文章，并缓存到Redis中（当然也可以持久化到数据库，相关配置已添加，只是没有实现），网页解析部分已抽象为接口，根据要抓取的不同网页，可以自定义对应的实现，也就是可以爬取任何网页了。

解析网页的方法返回的数据为List<Map>，再定义对应的实体，可以直接反射为实体（已实现），具体见后文的代码介绍。

下面介绍具体实现的步骤。

2、搭建Spring Boot并集成Redis

Spring Boot工程的搭建不用多说了，不管是Eclipse还是Idea，Spring都提供了懒人工具，可根据要使用的组件一键生成项目。

下面是Redis，首先是引入依赖：

 
 
        <dependency>
              <groupId>org.springframework.boot</groupId>
              <artifactId>spring-boot-starter-data-redis</artifactId>
          </dependency>

然后添加配置文件：


 
  
   #Redis
   spring.redis.database=0
   spring.redis.host=****
   spring.redis.password=a****
   spring.redis.pool.max-active=8
   spring.redis.pool.max-idle=8
   spring.redis.pool.max-wait=-1
   spring.redis.pool.min-idle=0
   spring.redis.port=****
   #spring.redis.sentinel.master= # Name of Redis server.
   #spring.redis.sentinel.nodes= # Comma-separated list of host:port pairs.
   spring.redis.timeout=0

ip和端口请自行根据实际情况填写。

然后是配置Redis，此处使用JavaConfig的方式：


 
  
   package com.anxpp.tinysoft.config;
   import com.fasterxml.jackson.annotation.JsonAutoDetect;
   import com.fasterxml.jackson.annotation.PropertyAccessor;
   import com.fasterxml.jackson.databind.ObjectMapper;
   import org.springframework.beans.factory.annotation.Value;
   import org.springframework.cache.CacheManager;
   import org.springframework.cache.annotation.EnableCaching;
   import org.springframework.cache.interceptor.KeyGenerator;
   import org.springframework.context.annotation.Bean;
   import org.springframework.context.annotation.Configuration;
   import org.springframework.data.redis.cache.RedisCacheManager;
   import org.springframework.data.redis.connection.RedisConnectionFactory;
   import org.springframework.data.redis.connection.jedis.JedisConnectionFactory;
   import org.springframework.data.redis.core.RedisTemplate;
   import org.springframework.data.redis.core.StringRedisTemplate;
   import org.springframework.data.redis.serializer.Jackson2JsonRedisSerializer;
   /**
    * Redis缓存配置
    * Created by anxpp.com on 2017/3/11.
    */
   @Configuration
   @EnableCaching
   public class RedisCacheConfig {
     
       @Value("${spring.redis.host}")
       private String host;
       @Value("${spring.redis.port}")
       private int port;
       @Value("${spring.redis.timeout}")
       private int timeout;
       @Value("${spring.redis.password}")
       private String password;
       @Bean
       public KeyGenerator csdnKeyGenerator() {
     
           return (target, method, params) -> {
     
               StringBuilder sb = new StringBuilder();
               sb.append(target.getClass().getName());
               sb.append(method.

最低0.47元/天解锁文章