如何利用数据分析提高英雄联盟的胜率

本文将利用外服的18w场英雄联盟(LOL)比赛的数据来进行数据分析,看看如何能帮助我们提升胜率。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
%matplotlib inline
plt.style.use('ggplot')

首先读入我们的数据:

champs = pd.read_csv('./data/champs.csv')
matches = pd.read_csv('./data/matches.csv')
participants = pd.read_csv('./data/participants.csv')
stats1 = pd.read_csv('./data/stats1.csv')
stats2 = pd.read_csv('./data/stats2.csv')
teambans = pd.read_csv('./data/teambans.csv')
teamstats = pd.read_csv('./data/teamstats.csv')

print(f'champs: {champs.shape}')
print(f'matches: {matches.shape}')
print(f'participants: {participants.shape}')
print(f'stats1: {stats1.shape}')
print(f'stats2: {stats2.shape}')
print(f'teambans: {teambans.shape}')
print(f'teamstats: {teamstats.shape}')

champs为英雄数据:

matches为比赛的信息:

participants为选手的信息:

stats1与stats2为比赛中发生的数据,比如KDA、消费金钱、插眼次数、连杀次数等:

我们将stats1与stat2拼接在一起:

stats = stats1.append(stats2)
print(f'stats: {stats.shape}')

将这些各种信息的表联结为一张表:

df = pd.merge(participants, stats, how='left', on=['id'])
df = pd.merge(df, champs, how='left', left_on='championid', right_on='id', suffixes=('', '_1'))
df = pd.merge(df, matches, how='left', left_on='matchid', right_on='id', suffixes=('', '_2'))

pd.set_option('display.max_columns', None)
df.head()

建立一个函数,作用是将“role”与“position”特征合并,得到整齐的表示选手位置的特征(属性为“TOP”、“MID”、“JUNGLE”、“DUO_SUPPORT”、“DUO_CARRY”分别对应“上单”、“中单”、“打野”、“辅助”、“C位”):

def adj_position(row):
    if row['role'] in ('DUO_SUPPORT', 'DUO_CARRY'):
        return row['role']
    else:
        return row['position']
    
df['adjposition'] = df.apply(adj_position, axis = 1) 

然后我们根据player特征将选手分队,1~5为第一队,6~10为第二队:

# 分队
df['team'] = df['player'].apply(lambda x: '1' if x <= 5 else '2')
df['team_role'] = df['team'] + '-' + df['adjposition']

以'1-MID'为例,可以看到对于同一个'matchid'(即同一场比赛)会出现多个'1-MID',这是不合理的:

df_remove = df[df['team_role'] == '1-MID'].groupby('matchid').agg({'team_role':'count'})
df_remove[df_remove['team_role'] != 1].index.values

移除这种同一场比赛出现多次相同位置的比赛数据:

remove_index = []
for i in ('1-MID', '1-TOP', '1-DUO_SUPPORT', '1-DUO_CARRY', '1-JUNGLE', '2-MID', '2-TOP', '2-DUO_SUPPORT', '2-DUO_CARRY', '2-JUNGLE'):
    df_remove = df[df['team_role'] == i].groupby('matchid').agg({'team_role':'count'})
    remove_index.extend(df_remove[df_remove['team_role'] != 1].index.values)

'BOT'被细分为了'DUO_SUPPORT'和'DUO_CARRY',移除更新后仍是'BOT'的数据:

remove_index.extend(df[df['adjposition'] == 'BOT']['matchid'].unique())
remove_index = list(set(remove_index))
print(f'清洗前的比赛场数: {df.matchid.nunique()}')
df = df[~df['matchid'].isin(remove_index)]
print(f'清洗后的比赛场数: {df.matchid.nunique()}')

在此次分析中,我们选取绝大部分的S8赛季的比赛:

df = df[['id', 'matchid', 'player', 'name', 'adjposition', 'team_role', 'win', 'kills', 'deaths', 'assists', 'turretkills','totdmgtochamp', 'totheal', 'totminionskilled', 'goldspent', 'totdmgtaken', 'inhibkills', 'pinksbought', 'wardsplaced', 'duration', 'platformid', 'seasonid', 'version']]
df = df[df['seasonid'] == 8]
print(f'Season 8的比赛场数: {df.matchid.nunique()}')
df.head(10)

'wardsplaced'特征为插眼的次数,移除次数过多的样本,先按累计百分比看看:

pd.set_option('display.max_rows', None)
pd.set_option('display.float_format', lambda x: '%.4f' % x) 
wardsplaced = df['wardsplaced'].value_counts().sort_index() / len(df)
wardsplaced.cumsum()

大约在37次的位置达到99%的累计百分比,就以这个数删去吧:

​# 将最大值转为99%的样本值
df['wardsplaced'] = df['wardsplaced'].apply(lambda x: x if x<=37 else 37)

让我们来看看不同位置对于插眼次数有什么不同:

plt.figure(figsize = (15, 10))
sns.violinplot(x="adjposition", y="wardsplaced", hue="win", data=df, palette='Set3', split=True, inner='quartile')
plt.title('Wardsplaced by adjposition: win vs loss')

哈哈果然,辅助是主要负责插眼的~C位看起来比较不用插眼

接下来研究击杀数,首先也是删去一些离群值:

kills = df['kills'].value_counts().sort_index() / len(df)
kills.cumsum()

那么在20次差不多达到了99%累计百分比

df['kills'] = df['kills'].apply(lambda x: x if x<=20 else 20)

看看不同位置对于击杀数的区别:

plt.figure(figsize = (15, 10))
sns.violinplot(x="adjposition", y="kills", hue="win", data=df, palette='Set3', split=True, inner='quartile')
plt.title('Kills by adjposition: win vs loss')

C位与中单是击杀数比较多的,而辅助明显是不太负责击杀的;而赢方的击杀数要明显高于输方~

再看看不同位置对于造成伤害量的区别:

plt.figure(figsize = (15, 10))
sns.violinplot(x="adjposition", y="totdmgtochamp", hue="win", data=df, palette='Set3', split=True, inner='quartile')
plt.title('totdmgtochamp by adjposition: win vs loss')

造成伤害量对于输赢的影响主要区分在“DUO_CARRY”,“TOP”,“MID”位置

我们开始对英雄分析,首先看看英雄的出场率:

f, ax = plt.subplots(figsize=(15, 12))
win_rate = df['name'].value_counts().sort_values(ascending=False)  
ax = pd.concat((win_rate.head(10), win_rate.tail(10))).plot(kind='bar') 
total_records = len(matches)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x() + p.get_width()/2.,
             height + 3,
             '{:.2f}%'.format(height/total_records*100),
             ha="center",
             rotation=0)

plt.xticks(rotation=45)
plt.yticks([2000, 5000, 10000, 20000, 30000, 40000, 50000, 60000])
plt.title('Top 10 and Last 10 Hero Picks')
plt.show()

那么出场率最高的分别是盲僧、凯特琳、卢锡安、锤石、阿狸等;出场率最低的分别是厄加特、库奇、虚空掘地者、铁铠冥魂、斯卡纳等

接下来看看每个特征与输赢之间的相关性,以及特征之间的相关性:

df_corr = df._get_numeric_data()
df_corr = df_corr.drop(['id', 'matchid', 'player', 'seasonid'], axis=1)

mask = np.zeros_like(df_corr.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
cmap = sns.diverging_palette(10, 200, as_cmap=True)

plt.figure(figsize=(15, 10))
sns.heatmap(df_corr.corr(), cmap=cmap, annot=True, fmt='.2f', mask=mask, square=True, linewidths=.5, center=0)
plt.title('Correlations - win vs factors (all games)')

可以看到与输赢关系比较大的特征有:死亡数、助攻数、炮塔摧毁数(inhibkills是个啥???)

再来看看20分钟以内结束的比赛的情况:

df_corr_2 = df._get_numeric_data()
df_corr_2 = df_corr_2[df_corr_2['duration'] <= 1200]
df_corr_2 = df_corr_2.drop(['id', 'matchid', 'player', 'seasonid'], axis=1)

mask = np.zeros_like(df_corr_2.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
cmap = sns.diverging_palette(10, 200, as_cmap=True)

plt.figure(figsize = (15, 10))
sns.heatmap(df_corr_2.corr(), cmap=cmap, annot=True, fmt='.2f', mask=mask, square=True, linewidths=.5, center=0)
plt.title('Correlations - win vs factors (for games last less than 20 mins)')

可以看到对于结束比较快的比赛,与输赢关系比较大的特征有:击杀数、死亡数、助攻数、炮塔摧毁数、消费金钱量

那么对于持续长时间的比赛呢?

df_corr_3 = df._get_numeric_data()
df_corr_3 = df_corr_3[df_corr_3['duration'] > 2400]
df_corr_3 = df_corr_3.drop(['id', 'matchid', 'player', 'seasonid'], axis=1)

mask = np.zeros_like(df_corr_3.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
cmap = sns.diverging_palette(10, 200, as_cmap=True)

plt.figure(figsize = (15, 10))
sns.heatmap(df_corr_3.corr(), cmap=cmap, annot=True, fmt='.2f', mask=mask, square=True, linewidths=.5, center=0)
plt.title('Correlations - win vs factors (for games last more than 40 mins)')

咦,好像关系都不是很大了......

然后再来分析分析KDA,不同英雄的胜率与平均KDA:

pd.set_option('display.precision', 2)

df_win_rate = df.groupby('name').agg({'win': 'sum', 'name': 'count', 'kills': 'mean', 'deaths': 'mean', 'assists': 'mean'})
df_win_rate.columns = ['win matches', 'total matches', 'K', 'D', 'A']
df_win_rate['win rate'] = df_win_rate['win matches'] /  df_win_rate['total matches'] * 100
df_win_rate['KDA'] = (df_win_rate['K'] + df_win_rate['A']) / df_win_rate['D']
df_win_rate = df_win_rate.sort_values('win rate', ascending=False)
df_win_rate = df_win_rate[['total matches', 'win rate', 'K', 'D', 'A', 'KDA']]

df_win_rate.head(10)

胜率最高的英雄为艾翁、冰晶凤凰、泽拉斯、阿狸、琴瑟仙女等

df_win_rate.tail(10)

胜率最低的英雄为虚空掘地者、影流之主、钢铁大师、戏命师、芮肯等

出场场次与胜率的散点图:

df_win_rate.reset_index(inplace=True)

def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))

df_win_rate['color map'] = df_win_rate['win rate'].apply(lambda x: 'red' if x>50 else 'green')

ax = df_win_rate.plot(kind='scatter', x='total matches', y='win rate', color=df_win_rate['color map'].tolist(), figsize=(15,10), title='win rate vs # matches by champions')

label_point(df_win_rate['total matches'], df_win_rate['win rate'], df_win_rate['name'], ax)

盲僧出场多但胜率较低,艾翁出场少但是胜率很高

让我们再来看看不同英雄与位置组合的胜率与KDA:

pd.options.display.float_format = '{:,.2f}'.format

df_win_rate_role = df.groupby(['name', 'adjposition']).agg({'win': 'sum', 'name': 'count', 'kills': 'mean', 'deaths': 'mean', 'assists': 'mean'})
df_win_rate_role.columns = ['win matches', 'total matches', 'K', 'D', 'A']
df_win_rate_role['win rate'] = df_win_rate_role['win matches'] /  df_win_rate_role['total matches'] * 100
df_win_rate_role['KDA'] = (df_win_rate_role['K'] + df_win_rate_role['A']) / df_win_rate_role['D']
df_win_rate_role = df_win_rate_role.sort_values('win rate', ascending=False)
df_win_rate_role = df_win_rate_role[['total matches', 'win rate', 'K', 'D', 'A', 'KDA']]

# 只取出场占全部场次0.01%以上的
df_win_rate_role = df_win_rate_role[df_win_rate_role['total matches']>df_win_rate_role['total matches'].sum()*0.0001]
df_win_rate_role.head(10)

用潘森中单的胜率最高,亚索打C位胜率也很高

df_win_rate_role.tail(10)

看一下整体的胜率图吧:

df_win_rate['win rate compared 50%'] = df_win_rate['win rate'] - 50.0

f, ax = plt.subplots(figsize=(12, 30))
sns.barplot(y='name', 
            x='win rate compared 50%', 
            data=df_win_rate.sort_values(by='win rate', ascending=False),
            palette='pastel')
plt.title('Win Rate Map')
plt.show()

整体的KDA图:

df_win_rate['KDA compared mean'] = df_win_rate['KDA'] - df_win_rate['KDA'].mean()

f, ax = plt.subplots(figsize=(12, 30))
sns.barplot(y='name', 
            x='KDA compared mean', 
            data=df_win_rate.sort_values(by='KDA', ascending=False),
            palette='pastel')
plt.title('KDA Map')
plt.show()

接下来将数据按比赛场次与位置整理,研究英雄对位相关的信息:

df_2 = df.sort_values(['matchid', 'adjposition'], ascending = [1, 1])

df_2['shift 1'] = df_2['name'].shift()
df_2['shift -1'] = df_2['name'].shift(-1)

#数据偏移一位,正好匹配到相同位置的对位英雄
def get_matchup(x):
    if x['player'] <= 5:
        if x['name'] < x['shift -1']:
            name_return = x['name'] + ' vs ' + x['shift -1']
        else:
            name_return = x['shift -1'] + ' vs ' + x['name']
    else:
        if x['name'] < x['shift 1']:
            name_return = x['name'] + ' vs ' + x['shift 1']
        else:
            name_return = x['shift 1'] + ' vs ' + x['name']
    return name_return

match_up = df_2.apply(get_matchup, axis=1)
df_2.insert(7, 'match up', match_up)
df_2.head(10)

规定胜利方为左方,即为1,否则为0,比如Draven vs Jinx,Draven赢了就为1,Jinx赢了为0:

win_adj = df_2.apply(lambda x: x['win'] if x['name'] == x['match up'].split(' vs ')[0] else 0, axis = 1)
df_2.insert(8, 'win_adj', win_adj)

df_2.head(10)

看一下中单的所有对位组合的胜负情况:

df_mid = df_2[df_2['adjposition']=='MID']

counter_mid = df_mid.groupby('match up').agg({'win': 'count', 'win_adj': 'sum'})
counter_mid.reset_index(inplace=True)
counter_mid.columns = ['match up', 'total matchs', 'total first win']
counter_mid['total matchs'] = counter_mid['total matchs'] / 2
counter_mid['counter rate'] = counter_mid['total first win'] / counter_mid['total matchs']
counter_mid['counter rate compared 50%'] = counter_mid['total first win'] / counter_mid['total matchs'] - 0.5

counter_mid['abs'] = abs(counter_mid['counter rate compared 50%']) 
counter_mid = counter_mid[(counter_mid['total matchs']>100) & (counter_mid['total first win']>0)].sort_values(by='abs', ascending=False)
counter_mid.reset_index(inplace=True)

counter_mid.head()

阿狸对线瑞兹,大部分是阿狸这一方赢;时间刺客对线瑞兹,大部分是时间刺客赢,瑞兹你......

plt.figure(figsize=(20, 150))
sns.barplot(x="counter rate compared 50%", y="match up", data=counter_mid, palette='pastel')

看一下上单的所有对位组合的胜负情况:

df_top = df_2[df_2['adjposition']=='TOP']

counter_top = df_top.groupby('match up').agg({'win': 'count', 'win_adj': 'sum'})
counter_top.reset_index(inplace=True)
counter_top.columns = ['match up', 'total matchs', 'total first win']
counter_top['total matchs'] = counter_top['total matchs'] / 2
counter_top['counter rate'] = counter_top['total first win'] / counter_top['total matchs']
counter_top['counter rate compared 50%'] = counter_top['total first win'] / counter_top['total matchs'] - 0.5

counter_top['abs'] = abs(counter_top['counter rate compared 50%']) 
counter_top = counter_top[(counter_top['total matchs']>100) & (counter_top['total first win']>0)].sort_values(by='abs', ascending=False)
counter_top.reset_index(inplace=True)

counter_top.head()

慎对线提莫大部分提莫这一方赢,亚索对线掘墓者大部分亚索这一方赢

plt.figure(figsize=(20, 150))
sns.barplot(x="counter rate compared 50%", y="match up", data=counter_top, palette='pastel')

好的,那么就先分析到这里吧!

 

 

 

 


关注微信公众号“数据科学与人工智能技术”发送“英雄联盟”可以得到数据集和代码~

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值