Jsoup：使用Java将爬虫得到的数据写入Excel，Jsoup得到的数据进行持久化，爬虫数据保存到本地Excel中

最新推荐文章于 2024-06-10 09:49:31 发布

XRT_knives

最新推荐文章于 2024-06-10 09:49:31 发布

阅读量568

点赞数 1

分类专栏： # Jsoup 文章标签： java 爬虫 intellij-idea EasyExcel

本文链接：https://blog.csdn.net/XRT_knives/article/details/122408184

版权

Jsoup 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Jsoup：使用Java将爬虫得到的数据写入Excel，Jsoup得到的数据进行持久化，爬虫数据保存到本地Excel中

一、资源

二、代码

xml依赖

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>easyexcel</artifactId>
            <version>3.0.5</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.41</version>
        </dependency>

        <!--Jsoup解析网页-->
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.10.2</version>
        </dependency>
            
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>

实体类

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Content {

    @ExcelProperty("商品名称")
    private String name;

    @ExcelProperty("商品价格")
    private String price;

    @ExcelProperty("商品图片路径")
    private String img;
}

写表工具类

@Component
public class HtmlParseUtil {
    public static void main(String[] args) throws Exception {
        String fileName = "D:\\IDEA\\Jsoup\\parseJD.xlsx";
        EasyExcel.write(fileName, Content.class)
                .sheet("Jsoup")
                .doWrite(new HtmlParseUtil().parseJD("java"));
    }

    public List<Content> parseJD(String keyword) throws Exception {

        //获取请求 https://search.jd.com/Search?keyword=java
        String url = "https://search.jd.com/Search?keyword=" + keyword;
        ArrayList<Content> contents = new ArrayList<>();

        //解析网页
        Document document = Jsoup.parse(new URL(url), 300000);

        //获取产品列表
        Element element = document.getElementById("J_goodsList");

        //获取产品列表中的li元素
        Elements li = element.getElementsByTag("li");

        //获取li里面的具体内容
        for (Element el : li) {
            String name = el.getElementsByClass("p-name").eq(0).text();
            String price = el.getElementsByClass("p-price").eq(0).text();
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");

            Content content = new Content();
            content.setName(name);
            content.setPrice(price);
            content.setImg(img);
            contents.add(content);
        }
        return contents;
    }
}

三、成功截图

在这里插入图片描述

XRT_knives

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
Jsoup：使用Java将爬虫得到的数据写入Excel，Jsoup得到的数据进行持久化，爬虫数据保存到本地Excel中

Jsoup：使用Java将爬虫得到的数据写入Excel，Jsoup得到的数据进行持久化，爬虫数据保存到本地Excel中一、资源EasyExcel使用教程Jsoup爬虫教程二、代码xml依赖 <dependency> <groupId>com.alibaba</groupId> <artifactId>easyexcel</artifactId>
复制链接

扫一扫