aiohttp 多任务异步协程式爬虫爬取某站图片

这是一个使用Python的异步请求库aiohttp从指定网站下载高清图片的脚本。用户可以输入页数和文件存储路径,脚本将爬取图片原始地址并保存为jpg格式,同时处理文件名中可能存在的非法字符。注意,由于版权原因,实际网址并未提供。
摘要由CSDN通过智能技术生成

爬取的是图片原地址 并不是预览图
点进图片后的大高清图
#版权原因 不能发图
代码如下
只需改变文件储存路径即可

import requests
import os
import asyncio
import aiohttp


async def getpic(oneof_datalist):
    async with aiohttp.ClientSession() as session:
        if 'coverImage' in oneof_datalist['data']:
            url = 'https://img2.huashi6.com/'+oneof_datalist['data']['coverImage']['originalPath']
            filename = oneof_datalist['data']['title']
            sets = ['/', '\\', ':', '*', '?', '"', '<', '>', '|']
            for char in filename:
                if char in sets:
                    filename = filename.replace(char, '')
            filename.split()
            async with await session.get(url) as response:
                picdata = await response.read()
                cpicPath = picPath + '/' + filename + '.jpg'
                with open(cpicPath, 'wb+') as f:
                    f.write(picdata)
                    print(filename, '下载成功')
        elif 'worksList'in oneof_datalist['data']:
            for i in oneof_datalist['data']['worksList']:
                url = 'https://img2.huashi6.com/'+i['coverImage']['originalPath']
                filename = i['title']
                sets = ['/', '\\', ':', '*', '?', '"', '<', '>', '|']
                for char in filename:
                    if char in sets:
                        filename = filename.replace(char, '')
                filename.split()
                async with await session.get(url) as response:
                    picdata = await response.read()
                    cpicPath = picPath + '/' + filename + '.jpg'
                    with open(cpicPath, 'wb+') as f:
                        f.write(picdata)
                        print(filename, '下载成功')

headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'
    }

page = int(input('enter  pages'))
picPath = r'{}'.format(input('plz input a  filepath '))

if not os.path.exists(picPath):
     os.makedirs(picPath)

json_list = []
for page in range(1,page):
    url = 'https://rt.huashi6.com/front/index/load_pc_data?_ts_=1636363387274&cursor={}-1636361232855'.format(page)
    response_data = requests.post(url = url,headers = headers).json()
    json_list.append(response_data)

datalist = []
for i in json_list:
    datalist.extend(i['data']['datas'])
print(len(datalist))

tasks = []
for one in datalist:
    c = getpic(one)
    task = asyncio.ensure_future(c)
    tasks.append(task)

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))


运行输入页数和文件存储地址 然后等待下载完毕就行了

https😕/www.huashi6.com/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值