1 《绝地求生》玩家排名预测
---- 你能预测《绝地求生》玩家战斗结束后的排名吗?
2 项目背景
2.1 项目简介
绝地求生(Player unknown’s Battlegrounds),俗称吃鸡,是一款战术竞技型射击类沙盒游戏。 这款游戏是一款大逃杀类型的游戏,每一局游戏将有最多100名玩家参与,他们将被投放在绝地岛(battlegrounds)上,在游戏的开始时所有人都一无所有。玩家需要在岛上收集各种资源,在不断缩小的安全区域内对抗其他玩家,让自己生存到最后。
该游戏拥有很高的自由度,玩家可以体验飞机跳伞、开越野车、丛林射击、抢夺战利品等玩法,小心四周埋伏的敌人,尽可能成为最后1个存活的人。
2.2 项目涉及知识点
sklearn基本操作
数据基本处理
机器学习基本算法的使用
2.3 数据集介绍
本项目中,将为您提供大量匿名的《绝地求生》游戏统计数据。 其格式为每行包含一个玩家的游戏后统计数据,列为数据的特征值。 数据来自所有类型的比赛:单排,双排,四排;不保证每场比赛有100名人员,每组最多4名成员。
文件说明:
train_V2.csv - 训练集
test_V2.csv - 测试集
数据集局部图如下图所示:
数据集中字段解释:
Id [用户id]
Player’s Id
groupId [所处小队id]
ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
matchId [该场比赛id]
ID to identify match. There are no matches that are in both the training and testing set.
assists [助攻数]
Number of enemy players this player damaged that were killed by teammates.
boosts [使用能量,道具数量]
Number of boost items used.
damageDealt [总伤害]
Total damage dealt. Note: Self inflicted damage is subtracted.
DBNOs [击倒敌人数量]
Number of enemy players knocked.
headshotKills [爆头数]
Number of enemy players killed with headshots.
heals [使用治疗药品数量]
Number of healing items used.
killPlace [本厂比赛杀敌排行]
Ranking in match of number of enemy players killed.
killPoints [Elo杀敌排名]
Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.) If there is a value other than -1 in rankPoints, then any 0 in killPoints should be treated as a “None”.
kills [杀敌数]
Number of enemy players killed.
killStreaks [连续杀敌数]
Max number of enemy players killed in a short amount of time.
longestKill [最远杀敌距离]
Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
matchDuration [比赛时长]
Duration of match in seconds.
matchType [比赛类型(小组人数)]
String identifying the game mode that the data comes from. The standard modes are “solo”, “duo”, “squad”, “solo-fpp”, “duo-fpp”, and “squad-fpp”; other modes are from events or custom matches.
maxPlace [本局最差名次]
Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
numGroups [小组数量]
Number of groups we have data for in the match.
rankPoints [Elo排名]
Elo-like ranking of player. This ranking is inconsistent and is being deprecated in the API’s next version, so use with caution. Value of -1 takes place of “None”.
revives [救活队员的次数]
Number of times this player revived teammates.
rideDistance [驾车距离]
Total distance traveled in vehicles measured in meters.
roadKills [驾车杀敌数]
Number of kills while in a vehicle.
swimDistance [游泳距离]
Total distance traveled by swimming measured in meters.
teamKills [杀死队友的次数]
Number of times this player killed a teammate.
vehicleDestroys [毁坏机动车的数量]
Number of vehicles destroyed.
walkDistance [步行距离]
Total distance traveled on foot measured in meters.
weaponsAcquired [收集武器的数量]
Number of weapons picked up.
winPoints [胜率Elo排名]
Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.) If there is a value other than -1 in rankPoints, then any 0 in winPoints should be treated as a “None”.
winPlacePerc [百分比排名]
The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.
3 项目评估方式
3.1 评估方式
你必须创建一个模型,根据他们的最终统计数据预测玩家的排名,从1(第一名)到0(最后一名)。
最后结果通过平均绝对误差(MAE)进行评估,即通过预测的winPlacePerc和真实的winPlacePerc之间的平均绝对误差
3.2 MAE(Maean Absolute Error)介绍
就是绝对误差的平均值
能更好地反映预测值误差的实际情况
𝑀𝐴𝐸(𝑋,ℎ)=1𝑚∑𝑖=1𝑚|ℎ(𝑥(𝑖))−𝑦(𝑖)|
api:
sklearn.metrics.mean_absolute_error
4 项目实现(数据分析+RL)
在接下来的分析中,我们将分析数据集,检测异常值。
然后我们通过随机森林模型对其训练,并对对该模型进行了优化。
导入数据基本处理阶段需要用到的api
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
4.1 获取数据、基本数据信息查看
导入数据,且查看数据的基本信息
train = pd.read_csv("./data/train_V2.csv")
train.describe()
assists boosts damageDealt DBNOs headshotKills heals killPlace killPoints kills killStreaks ... revives rideDistance roadKills swimDistance teamKills vehicleDestroys walkDistance weaponsAcquired winPoints winPlacePerc
count 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 ... 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446965e+06
mean 2.338149e-01 1.106908e+00 1.307171e+02 6.578755e-01 2.268196e-01 1.370147e+00 4.759935e+01 5.050060e+02 9.247833e-01 5.439551e-01 ... 1.646590e-01 6.061157e+02 3.496091e-03 4.509322e+00 2.386841e-02 7.918208e-03 1.154218e+03 3.660488e+00 6.064601e+02 4.728216e-01
std 5.885731e-01 1.715794e+00 1.707806e+02 1.145743e+00 6.021553e-01 2.679982e+00 2.746294e+01 6.275049e+02 1.558445e+00 7.109721e-01 ... 4.721671e-01 1.498344e+03 7.337297e-02 3.050220e+01 1.673935e-01 9.261157e-02 1.183497e+03 2.456544e+00 7.397004e+02 3.074050e-01
min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
25% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.400000e+01 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.551000e+02 2.000000e+00 0.000000e+00 2.000000e-01
50% 0.000000e+00 0.000000e+00 8.424000e+01 0.000000e+00 0.000000e+00 0.000000e+00 4.700000e+01 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 6.856000e+02 3.000000e+00 0.000000e+00 4.583000e-01
75% 0.000000e+00 2.000000e+00 1.860000e+02 1.000000e+00 0.000000e+00 2.000000e+00 7.100000e+01 1.172000e+03 1.000000e+00 1.000000e+00 ... 0.000000e+00 1.909750e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.976000e+03 5.000000e+00 1.495000e+03 7.407000e-01
max 2.200000e+01 3.300000e+01 6.616000e+03 5.300000e+01 6.400000e+01 8.000000e+01 1.010000e+02 2.170000e+03 7.200000e+01 2.000000e+01 ... 3.900000e+01 4.071000e+04 1.800000e+01 3.823000e+03 1.200000e+01 5.000000e+00 2.578000e+04 2.360000e+02 2.013000e+03 1.000000e+00
8 rows × 25 columns
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4446966 entries, 0 to 4446965
Data columns (total 29 columns):
Id object
groupId object
matchId object
assists int64
boosts int64
damageDealt float64
DBNOs int64
headshotKills int64
heals int64
killPlace int64
killPoints int64
kills int64
killStreaks int64
longestKill float64
matchDuration int64
matchType object
maxPlace int64
numGroups int64
rankPoints int64
revives int64
rideDistance float64
roadKills int64
swimDistance float64
teamKills int64
vehicleDestroys int64
walkDistance float64
weaponsAcquired int64
winPoints int64
winPlacePerc float64
dtypes: float64(6), int64(19), object(4)
memory usage: 983.9+ MB
可以看到数据一共有4446966条,
train.shape
(4446966, 29)
4.2 数据基本处理
4.2.1 数据缺失值处理
查看目标值,我们发现有一条样本,比较特殊,其“winplaceperc”的值为NaN,也就是目标值是缺失值,
因为只有一个玩家是这样,直接进行删除处理。
# 查看缺失值
train[train['winPlacePerc'].isnull()]
Id groupId matchId assists boosts damageDealt DBNOs headshotKills heals killPlace ... revives rideDistance roadKills swimDistance teamKills vehicleDestroys walkDistance weaponsAcquired winPoints winPlacePerc
2744604 f70c74418bb064 12dfbede33f92b 224a123c53e008 0 0 0.0 0 0 0 1 ... 0 0.0 0 0.0 0 0 0.0 0 0 NaN
1 rows × 29 columns
# 删除缺失值
train.drop(2744604, inplace=True)
train.shape
(4446965, 29)
4.2.2 特征数据规范化处理
4.2.2.1 查看每场比赛参加的人数
处理完缺失值之后,我们看一下每场参加的人数会有多少呢,是每次都会匹配100个人,才开始游戏吗?
# 显示每场比赛参加人数
# transform的作用类似实现了一个一对多的映射功能,把统计数量映射到对应的每个样本上
count = train.groupby('matchId')['matchId'].transform('count')
count
0 96
1 91
2 98
3 91
4 97
..
4446961 94
4446962 93
4446963 98
4446964 94
4446965 98
Name: matchId, Length: 4446965, dtype: int64
train['playersJoined'] = count
count.count()
4446965
train.head()
Id groupId matchId assists boosts damageDealt DBNOs headshotKills heals killPlace ... rideDistance roadKills swimDistance teamKills vehicleDestroys walkDistance weaponsAcquired winPoints winPlacePerc playersJoined
0 7f96b2f878858a 4d4b580de459be a10357fd1a4a91 0 0 0.00 0 0 0 60 ... 0.0000 0 0.00 0 0 244.80 1 1466 0.4444 96
1 eef90569b9d03c 684d5656442f9e aeb375fc57110c 0 0 91.47 0 0 0 57 ... 0.0045 0 11.04 0 0 1434.00 5 0 0.6400 91
2 1eaf90ac73de72 6a4a42c3245a74 110163d8bb94ae 1 0 68.00 0 0 0 47 ... 0.0000 0 0.00 0 0 161.80 2 0 0.7755 98
3 4616d365dd2853 a930a9c79cd721 f1f1f4ef412d7e 0 0 32.90 0 0 0 75 ... 0.0000 0 0.00 0 0 202.70 3 0 0.1667 91
4 315c96c26c9aac de04010b3458dd 6dc8ff871e21e6 0 0 100.00 0 0 0 45 ... 0.0000 0 0.00 0 0 49.75 2 0 0.1875 97
5 rows × 30 columns
# 通过每场参加人数进行,按值升序排列
train["playersJoined"].sort_values().head()
1206365 2
2109739 2
3956552 5
3620228 5
696000 5
Name: playersJoined, dtype: int64
通过结果发现,最少的一局,竟然只有两个人,wtf!!!!
# 通过绘制图像,查看每局开始人数
# 通过seaborn下的countplot方法,可以直接绘制统计过数量之后的直方图
plt.figure(figsize=(20,10))
sns.countplot(train['playersJoined'])
plt.title('playersJoined')
plt.grid()
plt.show()
通过观察,发现一局游戏少于75个玩