多线程爬取王者荣耀游戏壁纸

最新推荐文章于 2023-06-10 15:43:22 发布

习惯了一个人面对所有

最新推荐文章于 2023-06-10 15:43:22 发布

阅读量851

点赞数

分类专栏：爬虫 python

by 习惯于一个人面对所有

本文链接：https://blog.csdn.net/qiaoenshi/article/details/108669316

版权

爬虫同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

python

4 篇文章 0 订阅

订阅专栏

多线程爬取王者荣耀游戏壁纸

https://pvp.qq.com/web201605/wallpaper.shtml
王者荣耀游戏壁纸的下载页面
在这里插入图片描述
右键检查

但是在网页源代码里找不到壁纸的链接

这个文件是还需要进行一个解码操作，再提取壁纸的下载链接

进行多线程爬虫时，

把生产者和消费者各自创建一个类
这个类要继承threading.Thread类
还要写入run方法，
类中需要传入参数需要重写父类的__init__()方法
使用队列需要定义用到的队列

1.定义生产者类

class Producer(threading.Thread):
    def __init__(self, page_queue, url_queue, *args, **kwargs ):   # 重写父类方法
        super(Producer, self).__init__(*args, **kwargs)
        self.page_queue = page_queue
        self.url_queue = url_queue

    def run(self) -> None:
        while not self.page_queue.empty():
            url = self.page_queue.get()
            resp = requests.get(url, headers=headers)
            datas = resp.json()
            results = datas['List']
            for data in results:
            # 这里是一个单独定义的函数，来获取壁纸的下载地址
                image_urls = parse_url(data) 
                name = data['sProdName']
                name = parse.unquote(name) 
                dirpath = os.path.join("多线程爬取王者壁纸", name)
                if not os.path.exists(dirpath):
                    os.mkdir(dirpath)
                for index, image_url in enumerate(image_urls):
                    self.url_queue.put({"image_url": image_url, "image_path": os.path.join(dirpath, "%d.jpg"%(index+1))})

2.定义一个消费者类

class Consumer(threading.Thread):
    def __init__(self, url_queue, *args, **kwargs):
        super(Consumer,self).__init__(*args, **kwargs)
        self.url_queue = url_queue

    def run(self) -> None:
        while True:
            try:
                url_obj = self.url_queue.get(timeout=5)
                image_url = url_obj.get("image_url")
                image_path = url_obj.get("image_path")
                try:
                    request.urlretrieve(image_url,image_path)
                    print("%s下载完成"% image_path)
                except:
                    print(image_path+"下载失败")
            except:

                break

3.创建一个主函数，再创建多线程

def main():
    page_queue = queue.Queue(18)
    url_queue = queue.Queue(900)
    for x in range(1, 19):
        url = 'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=4&totalpage=0&page={page}&iOrder=0&iSortNumClose=1&iAMSActivityId=51991&_everyRead=true&iTypeId=1&iFlowId=267733&iActId=2735&iModuleId=2735&_=1600399000258'.format(page=x)
        page_queue.put(url)
    for x in range(3):
        th = Producer(page_queue, url_queue, name="生产者%d号"% x)
        th.start()

    for x in range(5):
        th = Consumer(url_queue, name="消费者%d号"% x)
        th.start()

最后附上源码下载链接：
https://download.csdn.net/download/qiaoenshi/12859717

习惯了一个人面对所有

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
多线程爬取王者荣耀游戏壁纸

多线程爬取王者荣耀游戏壁纸https://pvp.qq.com/web201605/wallpaper.shtml王者荣耀游戏壁纸的下载页面右键检查但是在网页源代码里找不到壁纸的链接这个文件是还需要进行一个解码操作，再提取壁纸的下载链接进行多线程爬虫时，把生产者和消费者各自创建一个类这个类要继承threading.Thread类还要写入run方法，类中需要传入参数需要重写父类的__init__()方法使用队列需要定义用到的队列1.定义生产者类class Producer
复制链接

扫一扫