写在前面:
写这篇博客的原因,看到一个3万到30万的it程序猿的博客就是这里,发现文中有些内容挺有意思,但是代码整体上并不整洁,新手入门可能还欠些内容(其实就是想玩一下)
直接贴代码
王者农药
import requests
from fake_useragent import UserAgent
ua = UserAgent()
url = 'http://pvp.qq.com/web201605/js/herolist.json'
head = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}
response = requests.get(url, headers=head)
hero_list = response.json()
# 提取英雄名字和数字
hero_name=list(map(lambda x:x['cname'], hero_list))
hero_number=list(map(lambda x:x['ename'], hero_list))
hero_name_title=list(map(lambda x:x['title'], hero_list))
h_l='http://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/'
for n,i in enumerate(hero_number):
headers = {
"User-Agent": ua.random,
"referer": "https://pvp.qq.com/web201605/herodetail/%s.shtml"%i
}
# 逐一遍历皮肤,此处假定一个英雄最多有15个皮肤
for sk_num in range(15):
hsl = h_l + str(i)+'/'+str(i)+'-bigskin-'+str(sk_num)+'.jpg'
hl = requests.get(hsl,headers=headers)
filepath = "./img/"+hero_name[n]+ "_" + hero_name_title[n]+str(sk_num) + '.jpg'
if hl.status_code == 200:
with open(filepath, 'wb') as f:
f.write(hl.content)
print(hero_name[n] + "ok" + str(sk_num))
else:
break
以上代码多数都是拷贝上面提到的那个小哥的,这个咱不避讳,抄了就是抄了
简单的做了点处理,其中存储图片路径,一定要记得在python文件目录下创建一个img
目录,不然文件老多了。
做了请求结果状态的判断,增加爬虫效率,减少没必要的时间浪费。
添加了反爬虫的简单配置。(user-agent)需要安装fake_useragent 直接用这个命令安装就行
pip3 install fake_useragent
想了又想,灵光乍现
既然能爬虫农药的,为什么不能看看脸萌的嘞?
找了一圈他的json文件,发现竟然维护在页面的js里面?而且英雄的中文名字竟然还用unicode转码的?算了,这个不重要,咱需要的图片
英雄脸萌
import requests
from fake_useragent import UserAgent
ua = UserAgent()
info_list = {
"266": "Aatrox",
"103": "Ahri",
"84": "Akali",
"12": "Alistar",
"32": "Amumu",
"34": "Anivia",
"1": "Annie",
"22": "Ashe",
"136": "AurelionSol",
"268": "Azir",
"432": "Bard",
"53": "Blitzcrank",
"63": "Brand",
"201": "Braum",
"51": "Caitlyn",
"164": "Camille",
"69": "Cassiopeia",
"31": "Chogath",
"42": "Corki",
"122": "Darius",
"131": "Diana",
"119": "Draven",
"36": "DrMundo",
"245": "Ekko",
"60": "Elise",
"28": "Evelynn",
"81": "Ezreal",
"9": "Fiddlesticks",
"114": "Fiora",
"105": "Fizz",
"3": "Galio",
"41": "Gangplank",
"86": "Garen",
"150": "Gnar",
"79": "Gragas",
"104": "Graves",
"120": "Hecarim",
"74": "Heimerdinger",
"420": "Illaoi",
"39": "Irelia",
"427": "Ivern",
"40": "Janna",
"59": "JarvanIV",
"24": "Jax",
"126": "Jayce",
"202": "Jhin",
"222": "Jinx",
"145": "Kaisa",
"429": "Kalista",
"43": "Karma",
"30": "Karthus",
"38": "Kassadin",
"55": "Katarina",
"10": "Kayle",
"141": "Kayn",
"85": "Kennen",
"121": "Khazix",
"203": "Kindred",
"240": "Kled",
"96": "KogMaw",
"7": "Leblanc",
"64": "LeeSin",
"89": "Leona",
"127": "Lissandra",
"236": "Lucian",
"117": "Lulu",
"99": "Lux",
"54": "Malphite",
"90": "Malzahar",
"57": "Maokai",
"11": "MasterYi",
"21": "MissFortune",
"62": "MonkeyKing",
"82": "Mordekaiser",
"25": "Morgana",
"267": "Nami",
"75": "Nasus",
"111": "Nautilus",
"518": "Neeko",
"76": "Nidalee",
"56": "Nocturne",
"20": "Nunu",
"2": "Olaf",
"61": "Orianna",
"516": "Ornn",
"80": "Pantheon",
"78": "Poppy",
"555": "Pyke",
"133": "Quinn",
"497": "Rakan",
"33": "Rammus",
"421": "RekSai",
"58": "Renekton",
"107": "Rengar",
"92": "Riven",
"68": "Rumble",
"13": "Ryze",
"113": "Sejuani",
"35": "Shaco",
"98": "Shen",
"102": "Shyvana",
"27": "Singed",
"14": "Sion",
"15": "Sivir",
"72": "Skarner",
"37": "Sona",
"16": "Soraka",
"50": "Swain",
"517": "Sylas",
"134": "Syndra",
"223": "TahmKench",
"163": "Taliyah",
"91": "Talon",
"44": "Taric",
"17": "Teemo",
"412": "Thresh",
"18": "Tristana",
"48": "Trundle",
"23": "Tryndamere",
"4": "TwistedFate",
"29": "Twitch",
"77": "Udyr",
"6": "Urgot",
"110": "Varus",
"67": "Vayne",
"45": "Veigar",
"161": "Velkoz",
"254": "Vi",
"112": "Viktor",
"8": "Vladimir",
"106": "Volibear",
"19": "Warwick",
"498": "Xayah",
"101": "Xerath",
"5": "XinZhao",
"157": "Yasuo",
"83": "Yorick",
"350": "Yuumi",
"154": "Zac",
"238": "Zed",
"115": "Ziggs",
"26": "Zilean",
"142": "Zoe",
"143": "Zyra"
}
id_list = []
name_list = []
for id in info_list:
base_url = "https://ossweb-img.qq.com/images/lol/web201310/skin/big%s"%id
base_name = info_list[id]
headers = {
"Referer":"https://lol.qq.com/data/info-defail.shtml?id=%s"%base_name,
"User-Agent":ua.random
}
for num in range(0,150):
n = ''
if 100>num>10:
n = "0"+str(num)
elif num >=100:
n = str(num)
else:
n = "00"+str(num)
get_url = base_url+n+".jpg"
page = requests.get(get_url,headers= headers)
if page.status_code == 200:
filepath = "./yxlm/"+base_name+"_"+n+".jpg"
with open(filepath,'wb') as f:
f.write(page.content)
print(base_name," ok")
else:
break
那么大一段,竟然都是id和名字关联的信息,哎,没办法呀。总不能请求一个js文件再解析吧,这种复制粘贴的事,越简单越好。
结构和上面农药
的差不多,下载速度还挺快的呢。
记得创建yxlm
目录