Kaggle之旅2

发布时间:2024年01月20日

Kaggle之旅2


前言

今天继续学习pandas。并实践下All GM Chess Games on Chess.com这个dataset。

一、目标

读入dataset,做一些统计工作,如下,

  1. 一共有多少GM,都是谁
  2. 每个GM下了多少盘棋(取top10)
  3. 平均每年每个GM下几盘棋(取top10)
  4. 由于数据集时间范围是2008-05-10 ~ 2022-06-24,我们就取2021年来看看当年谁的胜率最高,首先应该要统计2021年所有棋手平局下了多少局棋,然后统计出下棋局数超过平均值的棋手都是谁,最后计算这些棋手谁的胜率最高。
  5. 总体看,执白胜率高还是执黑胜率高

二、测试

代码如下

# 1. 统计一共有多少GM,以及是谁
print(f"1. 一共有 " + str(df.player_name.nunique()) +" 位GM,分别是:")
print(df.player_name.unique())

# 2. 统计每个GM下了多少盘棋
print("\n2. 每个GM下了多少盘棋(取top10):")
print(df.player_name.value_counts().head(10))

# 3. 平均每年每个GM下几盘棋
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
average_games_per_year = df.groupby(['player_name', 'Year']).size().groupby('player_name').mean().round(0).astype(int)

# 取前十名
top_10_gms_average_games = average_games_per_year.groupby('player_name').mean().sort_values(ascending=False).head(10)

print("\n3. 平均每年每个GM下几盘棋(取top10):")
print(top_10_gms_average_games)

# 4. 计算2021年谁的胜率最高,需要至少达到超过全年所有棋手平均下棋局数
# 提取关键字段
df_2021 = df[['player', 'White', 'Black', 'Result', 'Date']]
df_2021['Date'] = pd.to_datetime(df_2021['Date'])
df_2021['Year'] = df_2021['Date'].dt.year

# 计算2021年所有棋手下的总局数
games_in_2021 = df[df['Year'] == 2021]
total_games_by_player = pd.concat([games_in_2021['player'], games_in_2021['White'], games_in_2021['Black']]).value_counts()

# 计算全年平均下棋局数
average_games_per_player = total_games_by_player.mean()

# 过滤出下棋局数超过平均值的棋手
qualified_players = total_games_by_player[total_games_by_player > average_games_per_player].index

# 筛选出符合条件的对局
qualified_games = games_in_2021[games_in_2021['player'].isin(qualified_players)]

# 计算每个棋手的胜利次数和总局数
player_stats = pd.DataFrame()
for player in qualified_players:
    wins_as_white = qualified_games[(qualified_games['Result'] == '1-0') & ((qualified_games['White'] == player) | (qualified_games['Black'] == player))]
    wins_as_black = qualified_games[(qualified_games['Result'] == '0-1') & ((qualified_games['White'] == player) | (qualified_games['Black'] == player))]
    total_games = qualified_games[(qualified_games['White'] == player) | (qualified_games['Black'] == player)]

    player_stats = pd.concat([player_stats, pd.DataFrame({
        'player': [player],
        'total_wins': [len(wins_as_white) + len(wins_as_black)],
        'total_games': [len(total_games)]
    })])

# 计算胜率
player_stats['win_percentage'] = player_stats['total_wins'] / player_stats['total_games']

# 选择在2021年下棋局数超过平均值的棋手中胜率最高的
highest_win_percentage_player = player_stats.nlargest(1, 'win_percentage')

print("\n4. 2021年胜率最高且下棋局数超过全年平均的玩家:")
print(highest_win_percentage_player[['player', 'win_percentage', 'total_games']])

# 5. 总体看,执白胜率高还是执黑胜率高
# 创建一个新列 'player_color' 表示每个玩家在比赛中是执白还是执黑
df['player_color'] = df.apply(lambda row: 'White' if row['player'] == row['White'] else 'Black', axis=1)

# 创建一个新列 'player_result' 表示每个玩家在比赛中的结果(胜负关系)
df['player_result'] = df.apply(lambda row: 1 if (((row['player'] == row['White']) & (row['Result'] == '1-0')) | 
                                                 ((row['player'] == row['Black']) & (row['Result'] == '0-1'))) else 0, axis=1)

# 计算总体执白和执黑的比赛次数和胜率
overall_stats = df.groupby('player_color').agg(total_games=('Result', 'count'), total_wins=('player_result', 'sum'))

# 计算总体胜率
overall_stats['win_percentage'] = overall_stats['total_wins'] / overall_stats['total_games']

print("\n5. 总体执白和执黑的胜率统计:")
print(overall_stats)


结果

数据集很大,运行4、5比较慢,大约等个半小时吧。

1. 一共有 1055 位GM,分别是:
['Tingjie Lei' 'Дмитрий Хегай' 'Ivan Ivanisevic' ... 'Зубарев Александр'
 'ZURAB AZMAIPARASHVILI' 'Nikita Petrov']

2. 每个GM下了多少盘棋(取top10):
player_name
Rogelio Jr Antonio       138587
Daniel Naroditsky         70765
Aman Hambleton            46833
Hikaru Nakamura           43342
Hoang Thong Tu            41392
ZURAB AZMAIPARASHVILI     38654
Khatanbaatar Bazar        38212
Nihal Sarin               33453
Yannick Gozzoli           32219
Andrew Tang               30594
Name: count, dtype: int64

3. 平均每年每个GM下几盘棋(取top10):
player_name
Khatanbaatar Bazar       12737.0
Rogelio Jr Antonio       10661.0
ZURAB AZMAIPARASHVILI     9664.0
Brandon Jacobson          8586.0
Alireza Firouzja          7647.0
Lev Gutman                5656.0
Daniel Naroditsky         5443.0
Petar Drenchev            5006.0
Danielian Elina           4947.0
Hikaru Nakamura           4816.0
dtype: float64

4. 2021年胜率最高且下棋局数超过全年平均的玩家:
         player  win_percentage  total_games
0  sahpufjunior             1.0          238

5. 总体执白和执黑的胜率统计:
              total_games  total_wins  win_percentage
player_color                                         
Black             4014186      408073        0.101658
White              796890      445434        0.558965
文章来源:https://blog.csdn.net/sencloud/article/details/135660095
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。