爬虫实战练习之网站图片批量下载
难点
1.接口地址的规律
# ----------------------------解析实战-----------------------------
from urllib.request import HTTPHandler, build_opener, Request, urlretrieve, urlopen
from lxml import etree
# 第一页:https://www.aigei.com/s?dim=cartoon_124_animatio&detailTab=file&type=pic
# 第二页:https://www.aigei.com/s?dim=cartoon_124_animatio&detailTab=file&type=pic&page=2
# 第三页:https://www.aigei.com/s?dim=cartoon_124_animatio&detailTab=file&type=pic&page=3
base_url = 'https://www.aigei.com/s?dim=cartoon_124_animatio&detailTab=file&type=pic'
复制前三页的接口url,可以看出他们的相同和不同,不同之处且有规律可循
因此我们首先记下共同的部分。
2.设置请求头的内容
# header里为集合,每个键值用逗号,隔开!!!
# 'accept-encoding': 'gzip, deflate, br',不能带,会报错:
# '''utf-8' codec can't decode byte 0x8b in position 1: invalid start byte'''
header = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6'