Python之网络爬虫实战（爬图篇）——LOL英雄和皮肤我都要-CSDN博客

本文链接：https://blog.csdn.net/suoyue_py/article/details/102851136

在这里插入图片描述

使用requests库来爬取英雄联盟所有英雄及皮肤，小白有何不清楚可查看入门篇：Python之网络爬虫实战（入门篇）
打开英雄联盟官网的所有英雄所在的页面来获取英雄的编号Id：
https://lol.qq.com/data/info-heros.shtml
鼠标右键，选择“查看元素”（或直接按快捷键F12），点击选项“网络”，按快捷键F5刷新一下，避免部分文件没显示出来，下拉查找一个命名为hero_list.js的文件，该文件保存了所有英雄的相关信息，点击该文件，右边栏的消息头会有个请求网址https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js，该网址就是所要找的，保存了所有英雄的相关信息
网页打开https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js，出现的是混乱的代码：

对此使用快捷键：Ctrl+A将所有代码选中并复制下来，放到JSON解析https://www.json.cn/来使代码格式化，方便查看：
可见目前一共有145个英雄，展开hero目录，里面的heroId就是所要的，仔细观察会发现heroId并不是按1-145的顺序（注意此坑），故不能直接用个循环来解决
点开一个英雄，查看英雄的皮肤及对应的名称（操作与上述雷同）：
可见安妮有13个英雄皮肤
接着就是细节的处理与代码的编写了

爬取英雄联盟所有英雄及皮肤的完整代码：

import requests
import os

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"}
def get_hero():
    url = "https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js"
    res = requests.get(url).json()
    for hero in res['hero']:
        hero_id = hero['heroId']    #获取英雄编号
        detail_line = 'https://game.gtimg.cn/images/lol/act/img/js/hero/'+hero_id+'.js' #字符串拼接
        #detail_line = 'https://game.gtimg.cn/images/lol/act/img/js/hero/%s.js'%hero_id #python2.5
        #detail_line = f'https://game.gtimg.cn/images/lol/act/img/js/hero/{hero_id}.js' #字符串格式化python3.6
        #detail_line = 'https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js'.format(hero_id) #format()形式
        get_skin(detail_line)
    
def get_skin(url):
    res = requests.get(url,headers=headers).json()
    for skin in res["skins"]:
        if not skin["mainImg"]:
            continue
        item = {}
        item["heroName"] = skin["heroName"]     #英雄的名字
        item["skinName"] = skin["name"].replace("/","_")    #皮肤的名字并将名字中出现的斜线/用下划线代替_
        item["skinImage"] = skin["mainImg"] #皮肤的图片链接
        print(item)
        save(item)

def save(item):
    #构造一个目录
    hero_path = '.images/'+item['heroName']+'/'
    if not os.path.exists(hero_path):   #若目录不存在则创建目录
        os.makedirs(hero_path)
    res = requests.get(item["skinImage"])       #发送图片请求
    with open(hero_path + item["skinName"]+".png","wb") as f:
        f.write(res.content)

if __name__ == "__main__":
    get_hero()