↑↑↑关注后"星标"Datawhale
每日干货 & 每月组队学习,不错过
Datawhale干货
作者:牧小熊,华中农业大学,Datawhale原创作者
王者荣耀就要打KPL总决赛了,Datawhale数据项目群有粉丝希望来一期游戏数据挖掘。


0. 前言
玩过王者荣耀的同学都知道,游戏为了尽可能地平衡,包括游戏地地形和英雄地强度会每隔一段时间调整,因此游戏数据中游戏本身是变化的。
在游戏过程中,玩家的等级的高低以及金币的多少包括战术的多变也会让整个比赛变得极具有对抗性,同时选手的个人状态和现场教练的战术灵活指导也会让电子竞技的的结果瞬息万变。
在这个项目中我们我们尽可能地用简单的模型来来对结果进行预测,希望大家在享受电子竞技的同时也能感受数据挖掘的魅力。

1. 数据获取
针对KPL秋季赛,王者荣耀本身是提供了一个赛事数据平台
但是这个平台似乎不怎么稳定,公开的数据信息量不多,因此我们选择了玩加电竞平台。
链接:https://link.zhihu.com/?target=http%3A//www.wanplus.com/kog
2020KPL秋季赛常规赛总共有15轮依次对战,截止到现在总共产生了520场比赛,我们通过爬虫爬取比赛的相关信息。
第一部分是获得比赛在网站中的代号。
"""
王者荣耀比赛预测
#2020KPL秋季赛
比赛模型预测
"""
import time
import random
import requests
from bs4 import BeautifulSoup
from lxml import etree
import pandas as pd
from tqdm import tqdm
user_agent = [
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0",
"Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11",
"Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Avant Browser)"]
#爬取比赛的信息
def spyder():
headers = {'User-Agent': random.choice(user_agent)}
# 爬取比赛代号
schedule=[i for i in range(66028,66148)] #常规赛
schedule1=[i for i in range(66652,66664)] #季后赛
schedule.extend(schedule1)
match_list=[]
for i in tqdm(schedule):
url='https://www.wanplus.com/schedule/%s.html'%i
r = requests.get(url, headers=headers)
r.encoding = r.apparent_encoding
soup = BeautifulSoup(r.text, 'lxml')
all_match = soup.find_all('li', status='done')
for match in all_match:
matchnum=match['match']
match_list.append(matchnum)
df=pd.DataFrame(columns=['match','teama','teamb','label','moneya','moneyb','killa','killb','towera','towerb','bana1','bana2','bana3','bana4','banb1','banb2','banb3','banb4','heroa1','heroa2','heroa3','heroa4','heroa5','herob1','herob2','herob3','herob4','herob5','kdaa1','kdaa2','kdaa3','kdaa4','kdaa5','kdab1','kdab2','kdab3','kdab4','kdab5','moneya1','moneya2','moneya3','moneya4','moneya5','moneyb1','moneyb2','moneyb3','moneyb4','moneyb5','playera1','playera2','playera3','playera4','playera5','playerb1','playerb2','playerb3','playerb4','playerb5'])
for match in tqdm(match_list):
url='https://www.wanplus.com/match/%s.html#data'%match
try:
result_info=get_match_info(match,headers,url)
df=df.append(result_info,ignore_index=True)
except:
print('爬取失败,可以手动访问https://www.wanplus.com/match/%s.html#data补充'%match)
df.to_excel('./KPL1.xlsx',index=False)
第二部分爬取比赛的结果,推塔数,金币数,所选英雄等比赛相关信息。
def get_match_info(match,headers,url):
r = requests.get(url, headers=headers)
r.encoding = r.apparent_encoding
soup = BeautifulSoup(r.text, 'lxml')
win=0
#获得AB队名称
teama=soup.find('span',class_='tl bssj_tt1').get_text()
teamb=soup.find('span',class_&