一、普通方法下载图片
1.下载安装模块:pip install urllib
这个库常用两个函数:request 主要用于请求网页,parse主要用于解析URL,并提供许多解析URL的方法
2.常用到的一些方法
request . urlretrieve()下载文件和内容
parse . urlencode() 将字典转成url能识别的字符串拼接到URL
parse . parse_qs() 将url字符转为字典
request . urlopen() 模拟浏览器向目标服务器发送请求
parse . unquote() 中文解码
切割url路径
parse . urlparse(url)
parse . urlsplit(url)
3.英雄皮肤图片下载源码:
from urllib import request,parse
import re
for i in range(10):
# 请求地址的页数信息是变化的
url=f'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page={i}&iOrder=0&iSortNumClose=1&jsoncallback=jQuery171046841982663298865_1654507567751&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1654508531011'
res=request.urlopen(url)
date=res.read().decode('utf-8')
# 图片尺寸大小信息对应的值 sProdImgNo_2 数字项是变化的
image_size = {"2": "sProdImgNo_2", "3": "sProdImgNo_3", "4": "sProdImgNo_4", "5": "sProdImgNo_5","6": "sProdImgNo_6", "7": "sProdImgNo_7", "8": "sProdImgNo_8"}
# 正则查找图片名字信息的 list
namelist=re.findall(r'"sProdName":"(.*?)"',date)
print(namelist)
# 正则查找包含图片大小的url信息 list
for j in image_size:
sizelist = re.findall(r'"' + image_size[j] + '":"(.*?)"', date)
print(sizelist)
# 根据名字信息和图片url信息拼接 并下载图片
for i in range(len(sizelist)):
# 对名字部分转码,组装为图片名字
name=parse.unquote(namelist[i])+'.jpg'
# 普通下载并不是高清图,替换后缀 '/200' 为 '/0'
image_url=parse.unquote(sizelist[i]).replace('/200','/0')
print('正在下载',name)
request.urlretrieve(image_url,'images'+name)
print(name,'下载完成')
4.下载后的效果如下:
二、多线程下载图片方法:
1.导入多线程模块:from concurrent.futures import ThreadPoolExecutor;
2.创建多线程,并指定线程数:max_workers=10;这里相当于雇佣了10个工人帮我干活,提高了下载效率;
import requests
import os
from urllib import request
# 导入多线程模块
from concurrent.futures import ThreadPoolExecutor
def download_picture(url, heroname, skin_name):
try:
path = f'./picture/'+ heroname
print(path)
if not os.path.exists(path):
os.mkdir(path)
# 拼接下载图片的路径
request.urlretrieve(url, f'{path}/{skin_name}.jpg')
print(skin_name,'下载成功!')
except Exception as e:
pass
# 创建多线程
with ThreadPoolExecutor(max_workers=10) as executor:
resp = requests.get(url='https://pvp.qq.com/web201605/js/herolist.json')
# print(resp.text)
resp_dict_lists = resp.json()
# print(resp_dict_lists)
for hero in resp_dict_lists:
hero_name = hero['cname']
# print('英雄名字:', hero['cname'])
if 'skin_name' in hero:
skin_name = hero['skin_name']
# print('皮肤名字:', hero['skin_name'])
skin_num = skin_name.count('|')
skins = skin_name.split('|')
heroname = hero_name
for x in range(0, skin_num+1):
# 拼接下载路径
url = f'http://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/{str(hero["ename"])}/{str(hero["ename"])}-bigskin-{str(x+1)}.jpg'
# download_picture(url=url, heroname=heroname, skin_name=skins[x]) # 常规下载
# 调用多线程下载
executor.submit(download_picture, url, heroname, skins[x])
总结:多线程方法,在下载访问效率上要远远优于普通下载,但是多线程也存在一些弊端,当线程数足够大,会出现数据之间交叉发生冲突,这就需要对线程加锁,这里就不展开详细介绍。