最近学了一点爬虫的知识,就拿游戏里面的皮肤图片做对象进行练习。
爬虫:
网络爬虫(又被称为网页蜘蛛,网络机器人)就是模拟浏览器发送网络请求,接收请求响应,一种按照一定的规则,自动地抓取互联网信息的程序。
主要的模块就是 requests和re
前者用来处理网页资源,后者用来下载保存
爬取王者图片:
import requests
import re
import os
import shutil
url = 'http://pvp.qq.com/web201605/js/herolist.json'
html = requests.get(url)
html_json = html.json() # 转化为json格式
# print(html_json)
# 提取名称和数字
hero_name = list(map(lambda x: x['cname'], html_json)) # 名字
hero_num = list(map(lambda x: x['ename'], html_json)) # 数字
Filepath = 'D:\王者图片\\'
def RemoveDir(filepath):
if not os.path.exists(filepath):
os.mkdir(filepath)
else:
shutil.rmtree(filepath)
os.mkdir(filepath)
def rongyao(): # 用于下载和保存图片
RemoveDir(Filepath)
i = 0
for v in hero_num:
os.mkdir('D:\王者图片\\' + hero_name[i]) # 创建文件夹
os.chdir('D:\王者图片\\' + hero_name[i]) # 打开文件夹
i += 1
for u in range(12):
onehero_links = 'http://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/' + str(v) + '/' + str(
v) + '-bigskin-' + str(u) + '.jpg' # 图片地址
link = requests.get(onehero_links) # 得到链接,并请求链接
if link.status_code == 200:
img = re.split('-', onehero_links) # 截取字符串
open(img[-1], 'wb').write(link.content)
if __name__=='__main__':
start = time.time()
rongyao()
end = time.time()
print("用时" + str(end-start) + "秒")
爬取LOL图片:
import requests
import re
import time
def Download_LOL_Skin():
json_url = "https://lol.qq.com/biz/hero/champion.js"
html_re = requests.get(json_url).content
html_str = html_re.decode()
pat_js = r'"keys":(.*?),"data"'
enc = re.compile(pat_js)
html_list = enc.findall(html_str)
dict_js = eval(html_list[0])
download_url = []
for key in dict_js:
for i in range(15):
hero_str = str(i)
if len(hero_str)==1:
num = '00'
elif len(hero_str)==2:
num = '0'
numstr = key + num + hero_str
hero_download_url = r'https://ossweb-img.qq.com/images/lol/web201310/skin/big' + numstr +'.jpg'
download_url.append(hero_download_url)
file_path_list = []
path = 'E:/LOL图片/LOL_SKIN'
for name in dict_js.values():
for i in range(15):
file_path = path + name + str(i) + '.jpg'
file_path_list.append(file_path)
n = 1
for i in range(len(download_url)):
status_code = requests.get(download_url[i]).status_code
if status_code == 200:
res = requests.get(download_url[i],verify = False).content
with open(file_path_list[i],"wb") as f:
f.write(res)
print(download_url[i] + "第" + str(n) + "张下载完成")
n = n+1
print("共" + str(n) + "张图片下载完毕")
if __name__=='__main__':
start = time.time()
Download_LOL_Skin()
end = time.time()
print("用时" + str(end-start) + "秒")
注:爬取的图片:王者为每个英雄都创建了独立的文件夹 ;LOL则是直接放在了一个文件夹下
大家可以更改文件保存的路径,将爬取的图片放在想要的位置