爬虫实践-多线程Pixiv

kayotin

已于 2023-06-28 10:40:26 修改

阅读量1.2k

点赞数

分类专栏： Python项目实践文章标签：爬虫 python

于 2023-06-28 10:34:32 首次发布

本文链接：https://blog.csdn.net/kayotin/article/details/131431316

版权

Python项目实践专栏收录该内容

13 篇文章

订阅专栏

该文章介绍了一个实现通过Cookie登录Pixiv并下载排行榜图片的Python程序，支持设置保存路径、下载数量和多线程下载。用户需先获取Cookie，然后程序能获取周榜数据，下载图片并显示进度。目前功能包括下载单张图片及批量下载，未来计划支持下载特定画师所有作品。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

实现了以下功能

可以选择下载周榜、日榜的排行榜图片
可以选择保存路径，默认放在当前文件夹的output下
对于多P的图片，会放在同一个文件夹
可以输入下载数量，默认是前10
多线程下载图片
可以显示下载进度
下载某画师所有作品（待开发

程序主界面如下

在这里插入图片描述

通过Cookie登录Pixiv

P站需要登录才能访问，以下步骤可以获取cookie

首先从chrome登录www.pixiv.net

登录成功后打开F12开发者模式

点击网络（network），然后过滤（filter）：www.pixiv.net

在这里插入图片描述

可以使用以下代码测试是否登录成功：

import requests

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36 Edg/84.0.522.63',
    'Cookie': '你的cookie'
}

res = requests.get("https://www.pixiv.net", headers=headers)
print(res.text)

获取周榜榜单

通过以下链接发送请求，然后浏览器会返回json数据

weekly_url = "https://www.pixiv.net/ranking.php?mode=weekly&p=1&format=json"

我们通过以下代码，获取到图片的列表，然后通过循环遍历把需要的数据添加到一个列表中

res = requests.get(weekly_url, headers=headers)
datas = res.json()["contents"]
images_list = []
for data in datas:
    image = {
        "title": data["title"],
        "user_name":data["user_name"],
        "p_id": data["illust_id"],
        "referer": f"https://www.pixiv.net/artworks/{data['illust_id']}"
    }
    images_list.append(image)

下载单张图片

获取到了榜单信息，主要是p_id，我们就可以去下载图片了，以下以一张图片为例：

# 获取第一张图片
image_1 = images_list[0]
# 通过以下链接，请求图片详情
image_url = f"https://www.pixiv.net/ajax/illust/{image_1['p_id']}/pages?lang=zh"
resp_image = requests.get(image_url, headers=headers)
# 数据保存在body字段
image_data = resp_image.json()["body"]
# 因为图片可能有多p，所以是一个列表。我们拿第一个的urls字段，其中的original就是原图
download_url = image_data[0]["urls"]["original"]
download_headers = headers
# 如果不加referer字段，直接请求下载链接p站不给结果
download_headers["referer"] = image_1["referer"]
# 通过如下请求，我们获得了最终的下载链接
resp_final = requests.get(download_url, headers=download_headers)
file_name = download_url.split(".")[-1]
with open(f"output/{image_1['p_id']}.{file_name}", "wb") as file:
    file.write(resp_final.content)

批量下载

既然已经可以拿到单张的图片了，后面就是循环来下载就可以了。我这里用了线程池，来进行多线程的下载，代码如下：

@staticmethod
    def download_pic(url, headers, path, name, image, is_last):
        resp = requests.get(url, headers=headers)
        row_info = {
            "title": image["title"],
            "user_name": image["user_name"],
            "p_id": image["p_id"],
            "status": ""
        }
        if resp.status_code == 200:
            row_info["status"] = "下载成功"
            if is_last:
                PixivHana.queue.put(row_info)
            with open(f"{path}/{name}", "wb") as file:
                file.write(resp.content)
        else:
            row_info["status"] = "下载失败"
            PixivHana.queue.put(row_info)

    def download_thread(self):
        """多线程进行下载，多页放一个文件夹"""
        PixivHana.is_downloading = True
        self.after(500, lambda: self.check_queue())
        with ThreadPoolExecutor(max_workers=16) as pool:
            for image in self.images_list:
                download_headers = self.headers
                download_headers["referer"] = image["referer"]
                index = 1
                # 这个变量用来判断多p是否都下载了，下载最后一p才推送入队列
                is_last = False
                for down_url in image["download_url"]:
                    pic_type = down_url.split(".")[-1]
                    save_path = self.save_path.get()
                    if len(image["download_url"]) > 1:
                        dic_name = f"{image['title']}_{image['user_name']}"
                        dic_path = f"{self.save_path.get()}/{dic_name}"
                        dic_path = pathlib.Path(dic_path)
                        if not dic_path.exists():
                            dic_path.mkdir()
                        pic_name = f"p{index}.{pic_type}"
                        save_path = dic_path
                    else:
                        pic_name = f"{image['title']}_{image['user_name']}.{pic_type}"
                    if index == len(image["download_url"]):
                        is_last = True
                    index += 1
                    pool.submit(self.download_pic,
                                url=down_url,
                                headers=download_headers,
                                path=save_path,
                                name=pic_name,
                                image=image,
                                is_last=is_last,
                                )
        PixivHana.is_downloading = False
        self.progressbar.stop()