《绝地求生》玩家排名预测-pubg(竞赛)参考型模_其中最关键的是winplaceperc 百分比排名,这个作为我们的目标值进行计算 1、读取数-CSDN博客

本文链接：https://blog.csdn.net/itheimaliu/article/details/104441129

本文介绍了如何运用机器学习预测《绝地求生》玩家的战斗排名，涵盖数据预处理、模型训练及评估。通过sklearn库处理数据，使用随机森林模型，并针对异常值进行处理，以降低平均绝对误差（MAE）。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 《绝地求生》玩家排名预测

---- 你能预测《绝地求生》玩家战斗结束后的排名吗？

在这里插入图片描述

2 项目背景

2.1 项目简介

绝地求生(Player unknown’s Battlegrounds)，俗称吃鸡，是一款战术竞技型射击类沙盒游戏。这款游戏是一款大逃杀类型的游戏，每一局游戏将有最多100名玩家参与，他们将被投放在绝地岛(battlegrounds)上，在游戏的开始时所有人都一无所有。玩家需要在岛上收集各种资源，在不断缩小的安全区域内对抗其他玩家，让自己生存到最后。

该游戏拥有很高的自由度，玩家可以体验飞机跳伞、开越野车、丛林射击、抢夺战利品等玩法，小心四周埋伏的敌人，尽可能成为最后1个存活的人。
在这里插入图片描述

2.2 项目涉及知识点

sklearn基本操作

数据基本处理
机器学习基本算法的使用

2.3 数据集介绍

本项目中，将为您提供大量匿名的《绝地求生》游戏统计数据。其格式为每行包含一个玩家的游戏后统计数据，列为数据的特征值。数据来自所有类型的比赛：单排，双排，四排；不保证每场比赛有100名人员，每组最多4名成员。

文件说明:

train_V2.csv - 训练集

test_V2.csv - 测试集

数据集局部图如下图所示:
在这里插入图片描述


数据集中字段解释：

Id [用户id]
Player’s Id
groupId [所处小队id]
ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
matchId [该场比赛id]
ID to identify match. There are no matches that are in both the training and testing set.
assists [助攻数]
Number of enemy players this player damaged that were killed by teammates.
boosts [使用能量,道具数量]
Number of boost items used.
damageDealt [总伤害]
Total damage dealt. Note: Self inflicted damage is subtracted.
DBNOs [击倒敌人数量]
Number of enemy players knocked.
headshotKills [爆头数]
Number of enemy players killed with headshots.
heals [使用治疗药品数量]
Number of healing items used.
killPlace [本厂比赛杀敌排行]
Ranking in match of number of enemy players killed.
killPoints [Elo杀敌排名]
Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.) If there is a value other than -1 in rankPoints, then any 0 in killPoints should be treated as a “None”.
kills [杀敌数]
Number of enemy players killed.
killStreaks [连续杀敌数]
Max number of enemy players killed in a short amount of time.
longestKill [最远杀敌距离]
Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
matchDuration [比赛时长]
Duration of match in seconds.
matchType [比赛类型(小组人数)]
String identifying the game mode that the data comes from. The standard modes are “solo”, “duo”, “squad”, “solo-fpp”, “duo-fpp”, and “squad-fpp”; other modes are from events or custom matches.
maxPlace [本局最差名次]
Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
numGroups [小组数量]
Number of groups we have data for in the match.
rankPoints [Elo排名]
Elo-like ranking of player. This ranking is inconsistent and is being deprecated in the API’s next version, so use with caution. Value of -1 takes place of “None”.
revives [救活队员的次数]
Number of times this player revived teammates.
rideDistance [驾车距离]
Total distance traveled in vehicles measured in meters.
roadKills [驾车杀敌数]
Number of kills while in a vehicle.
swimDistance [游泳距离]
Total distance traveled by swimming measured in meters.
teamKills [杀死队友的次数]
Number of times this player killed a teammate.
vehicleDestroys [毁坏机动车的数量]
Number of vehicles destroyed.
walkDistance [步行距离]
Total distance traveled on foot measured in meters.
weaponsAcquired [收集武器的数量]
Number of weapons picked up.
winPoints [胜率Elo排名]
Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.) If there is a value other than -1 in rankPoints, then any 0 in winPoints should be treated as a “None”.
winPlacePerc [百分比排名]
The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.

3 项目评估方式

3.1 评估方式

你必须创建一个模型，根据他们的最终统计数据预测玩家的排名，从1（第一名）到0（最后一名）。

最后结果通过平均绝对误差（MAE）进行评估，即通过预测的winPlacePerc和真实的winPlacePerc之间的平均绝对误差

3.2 MAE(Maean Absolute Error)介绍

就是绝对误差的平均值
能更好地反映预测值误差的实际情况
𝑀𝐴𝐸(𝑋,ℎ)=1𝑚∑𝑖=1𝑚|ℎ(𝑥(𝑖))−𝑦(𝑖)|
api:

sklearn.metrics.mean_absolute_error

4 项目实现（数据分析+RL）

在接下来的分析中，我们将分析数据集，检测异常值。

然后我们通过随机森林模型对其训练，并对对该模型进行了优化。

导入数据基本处理阶段需要用到的api

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

4.1 获取数据、基本数据信息查看

导入数据，且查看数据的基本信息

train = pd.read_csv("./data/train_V2.csv")
train.describe()
assists	boosts	damageDealt	DBNOs	headshotKills	heals	killPlace	killPoints	kills	killStreaks	...	revives	rideDistance	roadKills	swimDistance	teamKills	vehicleDestroys	walkDistance	weaponsAcquired	winPoints	winPlacePerc
count	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	...	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446966e+06	4.446965e+06
mean	2.338149e-01	1.106908e+00	1.307171e+02	6.578755e-01	2.268196e-01	1.370147e+00	4.759935e+01	5.050060e+02	9.247833e-01	5.439551e-01	...	1.646590e-01	6.061157e+02	3.496091e-03	4.509322e+00	2.386841e-02	7.918208e-03	1.154218e+03	3.660488e+00	6.064601e+02	4.728216e-01
std	5.885731e-01	1.715794e+00	1.707806e+02	1.145743e+00	6.021553e-01	2.679982e+00	2.746294e+01	6.275049e+02	1.558445e+00	7.109721e-01	...	4.721671e-01	1.498344e+03	7.337297e-02	3.050220e+01	1.673935e-01	9.261157e-02	1.183497e+03	2.456544e+00	7.397004e+02	3.074050e-01
min	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	1.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	...	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
25%	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	2.400000e+01	0.000000e+00	0.000000e+00	0.000000e+00	...	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	1.551000e+02	2.000000e+00	0.000000e+00	2.000000e-01
50%	0.000000e+00	0.000000e+00	8.424000e+01	0.000000e+00	0.000000e+00	0.000000e+00	4.700000e+01	0.000000e+00	0.000000e+00	0.000000e+00	...	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	6.856000e+02	3.000000e+00	0.000000e+00	4.583000e-01
75%	0.000000e+00	2.000000e+00	1.860000e+02	1.000000e+00	0.000000e+00	2.000000e+00	7.100000e+01	1.172000e+03	1.000000e+00	1.000000e+00	...	0.000000e+00	1.909750e-01	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	1.976000e+03	5.000000e+00	1.495000e+03	7.407000e-01
max	2.200000e+01	3.300000e+01	6.616000e+03	5.300000e+01	6.400000e+01	8.000000e+01	1.010000e+02	2.170000e+03	7.200000e+01	2.000000e+01	...	3.900000e+01	4.071000e+04	1.800000e+01	3.823000e+03	1.200000e+01	5.000000e+00	2.578000e+04	2.360000e+02	2.013000e+03	1.000000e+00
8 rows × 25 columns

train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4446966 entries, 0 to 4446965
Data columns (total 29 columns):
Id                 object
groupId            object
matchId            object
assists            int64
boosts             int64
damageDealt        float64
DBNOs              int64
headshotKills      int64
heals              int64
killPlace          int64
killPoints         int64
kills              int64
killStreaks        int64
longestKill        float64
matchDuration      int64
matchType          object
maxPlace           int64
numGroups          int64
rankPoints         int64
revives            int64
rideDistance       float64
roadKills          int64
swimDistance       float64
teamKills          int64
vehicleDestroys    int64
walkDistance       float64
weaponsAcquired    int64
winPoints          int64
winPlacePerc       float64
dtypes: float64(6), int64(19), object(4)
memory usage: 983.9+ MB
可以看到数据一共有4446966条，

train.shape
(4446966, 29)

4.2 数据基本处理

4.2.1 数据缺失值处理

查看目标值，我们发现有一条样本，比较特殊，其“winplaceperc”的值为NaN，也就是目标值是缺失值，

因为只有一个玩家是这样，直接进行删除处理。

# 查看缺失值

train[train['winPlacePerc'].isnull()]
Id	groupId	matchId	assists	boosts	damageDealt	DBNOs	headshotKills	heals	killPlace	...	revives	rideDistance	roadKills	swimDistance	teamKills	vehicleDestroys	walkDistance	weaponsAcquired	winPoints	winPlacePerc
2744604	f70c74418bb064	12dfbede33f92b	224a123c53e008	0	0	0.0	0	0	0	1	...	0	0.0	0	0.0	0	0	0.0	0	0	NaN
1 rows × 29 columns

# 删除缺失值
train.drop(2744604, inplace=True)
train.shape
(4446965, 29)
4.2.2  特征数据规范化处理
4.2.2.1  查看每场比赛参加的人数
处理完缺失值之后，我们看一下每场参加的人数会有多少呢，是每次都会匹配100个人，才开始游戏吗？

# 显示每场比赛参加人数
# transform的作用类似实现了一个一对多的映射功能，把统计数量映射到对应的每个样本上
count = train.groupby('matchId')['matchId'].transform('count')
count
0          96
1          91
2          98
3          91
4          97
           ..
4446961    94
4446962    93
4446963    98
4446964    94
4446965    98
Name: matchId, Length: 4446965, dtype: int64
train['playersJoined'] = count
count.count()
4446965
train.head()
Id	groupId	matchId	assists	boosts	damageDealt	DBNOs	headshotKills	heals	killPlace	...	rideDistance	roadKills	swimDistance	teamKills	vehicleDestroys	walkDistance	weaponsAcquired	winPoints	winPlacePerc	playersJoined
0	7f96b2f878858a	4d4b580de459be	a10357fd1a4a91	0	0	0.00	0	0	0	60	...	0.0000	0	0.00	0	0	244.80	1	1466	0.4444	96
1	eef90569b9d03c	684d5656442f9e	aeb375fc57110c	0	0	91.47	0	0	0	57	...	0.0045	0	11.04	0	0	1434.00	5	0	0.6400	91
2	1eaf90ac73de72	6a4a42c3245a74	110163d8bb94ae	1	0	68.00	0	0	0	47	...	0.0000	0	0.00	0	0	161.80	2	0	0.7755	98
3	4616d365dd2853	a930a9c79cd721	f1f1f4ef412d7e	0	0	32.90	0	0	0	75	...	0.0000	0	0.00	0	0	202.70	3	0	0.1667	91
4	315c96c26c9aac	de04010b3458dd	6dc8ff871e21e6	0	0	100.00	0	0	0	45	...	0.0000	0	0.00	0	0	49.75	2	0	0.1875	97
5 rows × 30 columns

# 通过每场参加人数进行，按值升序排列
train["playersJoined"].sort_values().head()
1206365    2
2109739    2
3956552    5
3620228    5
696000     5
Name: playersJoined, dtype: int64
通过结果发现，最少的一局，竟然只有两个人，wtf!!!!

# 通过绘制图像，查看每局开始人数
# 通过seaborn下的countplot方法，可以直接绘制统计过数量之后的直方图
plt.figure(figsize=(20,10))
sns.countplot(train['playersJoined'])
plt.title('playersJoined')
plt.grid()
plt.show()

在这里插入图片描述

通过观察，发现一局游戏少于75个玩