用js做爬虫帮朋友爬取图片。

远方0905

已于 2022-10-22 13:10:53 修改

阅读量297

点赞数 1

分类专栏： JavaScript 文章标签： javascript 爬虫

于 2022-10-21 17:23:52 首次发布

本文链接：https://blog.csdn.net/qq_43198727/article/details/127449371

版权

JavaScript 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

爬虫教程

这个教程通俗易懂。

爬取网址

根据满哥学爬虫

需要下载axios yarn add axios -s
需要下载 cheerio yarn add cheerio -s
需要下载 express
更改 axios.get() 里面的链接就可以。
本文代码可以直接复制运行。

整体的逻辑基于这个页面

先用apifox 测试了一下接口，拿到了整体也页面的数据。
然后用cheerio去看也面的分页情况 .pagination 找到下面的a标签。
迭代获取a标签的内容然后存储起来判断有没有下一页递归调用函数。。
up主讲的详细。


const axios = require("axios");

const cheerio = require("cheerio");
const fs = require("fs");
const path = require("path")
// console.log(axios);
const urls = [];
const baseUrl = "https://www.jpmn5.com"
const nextText = "下一页"
let index = 0;
const getCosplay = async () => {
    console.log(index);
    const body = await axios.get(`https://www.jpmn5.com/Cosplay/Cosplay18126${index ? "_" + index : ""}.html`).then(async res => res.data);
    const $ = cheerio.load(body)

    const page = $(".pagination").eq(0).find("a");

    const pageArr = page.map(function () {
        return $(this).text()
    }).toArray()
    if (pageArr.includes(nextText)) {
        $(".article-content p img").each(function () {
            urls.push(baseUrl + $(this).attr("src"))
        })
        index++;
        await getCosplay()
    }
    // console.log(urls);
    writeFile(urls)
}

const writeFile = function (urls) {
    urls.forEach(async url => {
        console.log(url);
        const buffer = await axios.get(url, { responseType: "arraybuffer" }).then(res => res.data);
        const ws = fs.createWriteStream(path.join(__dirname, '../cos' + new Date().getTime() + ".jpg"))
        ws.write(buffer)
    });
}
getCosplay()
// console.log();