Kaggle之旅2

本文通过Kaggle的数据集,使用pandas进行数据分析,包括统计GM数量、棋手对局量、平均每年下棋量,并计算2021年胜率最高棋手及总体黑白棋的胜率。
摘要由CSDN通过智能技术生成

Kaggle之旅2


前言

今天继续学习pandas。并实践下All GM Chess Games on Chess.com这个dataset。

一、目标

读入dataset,做一些统计工作,如下,

  1. 一共有多少GM,都是谁
  2. 每个GM下了多少盘棋(取top10)
  3. 平均每年每个GM下几盘棋(取top10)
  4. 由于数据集时间范围是2008-05-10 ~ 2022-06-24,我们就取2021年来看看当年谁的胜率最高,首先应该要统计2021年所有棋手平局下了多少局棋,然后统计出下棋局数超过平均值的棋手都是谁,最后计算这些棋手谁的胜率最高。
  5. 总体看,执白胜率高还是执黑胜率高

二、测试

代码如下

# 1. 统计一共有多少GM,以及是谁
print(f"1. 一共有 " + str(df.player_name.nunique()) +" 位GM,分别是:")
print(df.player_name.unique())

# 2. 统计每个GM下了多少盘棋
print("\n2. 每个GM下了多少盘棋(取top10):")
print(df.player_name.value_counts().head(10))

# 3. 平均每年每个GM下几盘棋
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
average_games_per_year = df.groupby(['player_name', 'Year']).size().groupby('player_name').mean().round(0).astype(int)

# 取前十名
top_10_gms_average_games = average_games_per_year.groupby('player_name').mean().sort_values(ascending=False).head(10)

print("\n3. 平均每年每个GM下几盘棋(取top10):")
print(top_10_gms_average_games)

# 4. 计算2021年谁的胜率最高,需要至少达到超过全年所有棋手平均下棋局数
# 提取关键字段
df_2021 = df[['player', 'White', 'Black', 'Result', 'Date']]
df_2021['Date'] = pd.to_datetime(df_2021['Date'])
df_2021['Year'] = df_2021['Date'].dt.year

# 计算2021年所有棋手下的总局数
games_in_2021 = df[df['Year'] == 2021]
total_games_by_player = pd.concat([games_in_2021['player'], games_in_2021['White'], games_in_2021['Black']]).value_counts()

# 计算全年平均下棋局数
average_games_per_player = total_games_by_player.mean()

# 过滤出下棋局数超过平均值的棋手
qualified_players = total_games_by_player[total_games_by_player > average_games_per_player].index

# 筛选出符合条件的对局
qualified_games = games_in_2021[games_in_2021['player'].isin(qualified_players)]

# 计算每个棋手的胜利次数和总局数
player_stats = pd.DataFrame()
for player in qualified_players:
    wins_as_white = qualified_games[(qualified_games['Result'] == '1-0') & ((qualified_games['White'] == player) | (qualified_games['Black'] == player))]
    wins_as_black = qualified_games[(qualified_games['Result'] == '0-1') & ((qualified_games['White'] == player) | (qualified_games['Black'] == player))]
    total_games = qualified_games[(qualified_games['White'] == player) | (qualified_games['Black'] == player)]

    player_stats = pd.concat([player_stats, pd.DataFrame({
        'player': [player],
        'total_wins': [len(wins_as_white) + len(wins_as_black)],
        'total_games': [len(total_games)]
    })])

# 计算胜率
player_stats['win_percentage'] = player_stats['total_wins'] / player_stats['total_games']

# 选择在2021年下棋局数超过平均值的棋手中胜率最高的
highest_win_percentage_player = player_stats.nlargest(1, 'win_percentage')

print("\n4. 2021年胜率最高且下棋局数超过全年平均的玩家:")
print(highest_win_percentage_player[['player', 'win_percentage', 'total_games']])

# 5. 总体看,执白胜率高还是执黑胜率高
# 创建一个新列 'player_color' 表示每个玩家在比赛中是执白还是执黑
df['player_color'] = df.apply(lambda row: 'White' if row['player'] == row['White'] else 'Black', axis=1)

# 创建一个新列 'player_result' 表示每个玩家在比赛中的结果(胜负关系)
df['player_result'] = df.apply(lambda row: 1 if (((row['player'] == row['White']) & (row['Result'] == '1-0')) | 
                                                 ((row['player'] == row['Black']) & (row['Result'] == '0-1'))) else 0, axis=1)

# 计算总体执白和执黑的比赛次数和胜率
overall_stats = df.groupby('player_color').agg(total_games=('Result', 'count'), total_wins=('player_result', 'sum'))

# 计算总体胜率
overall_stats['win_percentage'] = overall_stats['total_wins'] / overall_stats['total_games']

print("\n5. 总体执白和执黑的胜率统计:")
print(overall_stats)


结果

数据集很大,运行4、5比较慢,大约等个半小时吧。

1. 一共有 1055 位GM,分别是:
['Tingjie Lei' 'Дмитрий Хегай' 'Ivan Ivanisevic' ... 'Зубарев Александр'
 'ZURAB AZMAIPARASHVILI' 'Nikita Petrov']

2. 每个GM下了多少盘棋(取top10):
player_name
Rogelio Jr Antonio       138587
Daniel Naroditsky         70765
Aman Hambleton            46833
Hikaru Nakamura           43342
Hoang Thong Tu            41392
ZURAB AZMAIPARASHVILI     38654
Khatanbaatar Bazar        38212
Nihal Sarin               33453
Yannick Gozzoli           32219
Andrew Tang               30594
Name: count, dtype: int64

3. 平均每年每个GM下几盘棋(取top10):
player_name
Khatanbaatar Bazar       12737.0
Rogelio Jr Antonio       10661.0
ZURAB AZMAIPARASHVILI     9664.0
Brandon Jacobson          8586.0
Alireza Firouzja          7647.0
Lev Gutman                5656.0
Daniel Naroditsky         5443.0
Petar Drenchev            5006.0
Danielian Elina           4947.0
Hikaru Nakamura           4816.0
dtype: float64

4. 2021年胜率最高且下棋局数超过全年平均的玩家:
         player  win_percentage  total_games
0  sahpufjunior             1.0          238

5. 总体执白和执黑的胜率统计:
              total_games  total_wins  win_percentage
player_color                                         
Black             4014186      408073        0.101658
White              796890      445434        0.558965
  • 10
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

旻璿gg

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值