数据分析-26-120年奥运会数据分析(包含代码数据)

发布时间:2024年01月18日

0. 代码数据下载

关注公众号:『AI学习星球
回复:奥运会数据分析 即可获取数据下载。
算法学习4对1辅导论文辅导核心期刊可以通过公众号或?v:codebiubiubiu滴滴我
在这里插入图片描述


1. 项目背景

本项目是对120年来的奥运会数据集(夏季奥运会)的简单分析。
主要探讨的是以下三个方面:

  1. 奥运会里的男性与女性运动员

  2. 奥运会历年来的Top

  3. 中国的奥运会历史

2. 项目分析

1. 数据说明

该数据集包含两个文件:

  • athlete_events.csv:参赛运动员基本生物数据和奖牌结果

  • noc_regions.csv:国家奥委会3个字母的代码与对应国家信息

文件athlete_events.csv中包含15个字段,具体信息如下:

字段名称字段含义
ID给每个运动员的唯一ID
Name运动员名字
Sex性别
Age年龄
Height身高
Weight体重
Team所代表的国家队
NOC国家奥委会3个字母的代码
Games年份与季节
Year比赛年份
Season比赛季节
City举办城市
Sport运动类别
Event比赛项目
Medal奖牌

文件noc_regions.csv中包含3个字段,具体信息如下:

字段名称字段含义
NOC国家奥委会3个字母的代码
Region国家
Notes地区

2. 数据处理

a. 准备数据处理的包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotly.graph_objs import *
import plotly.graph_objs as go
import colorlover as cl
from plotly.offline import init_notebook_mode, iplot
f_p = 'athlete_events.csv'
athlete_events = pd.read_csv(f_p)
b. 读取前五行数据
athlete_events.head()

在这里插入图片描述

c. 数据大小
athlete_events.shape

(271116, 15)

d. 数据中各个字段的空值的个数
athlete_events.isnull().sum()

在这里插入图片描述

e. 字段信息
athlete_events.info()

在这里插入图片描述

f. 数据统计
athlete_events.describe()

在这里插入图片描述

3. 词云(通过创建词云展示奥运会的热门运动项目,字体越大代表越热门)

print(' Total of',athlete_events['Sport'].nunique(),'unique sports were played. \n \n Following is the list:\n \n', athlete_events['Sport'].unique())
from wordcloud import WordCloud, STOPWORDS
stopwords = set(STOPWORDS)

def show_wordcloud(data, title = None):
    wordcloud = WordCloud(
        background_color='black',
        stopwords=stopwords,
        max_words=200,
        max_font_size=40, 
        scale=3,
        random_state=1 # chosen at random by flipping a coin; it was heads
).generate(str(data))

    fig = plt.figure(1, figsize=(15, 15))
    plt.axis('off')
    if title: 
        fig.suptitle(title, fontsize=20)
        fig.subplots_adjust(top=2.3)

    plt.imshow(wordcloud)
    plt.show()

show_wordcloud(athlete_events['Sport'])

4. 饼图 查看奥运会男女参赛人数的比例

fig = {
  "data": [
    {
      "values": athlete_events['Sex'].value_counts(),
      "labels": [
        "Male",
        "Female",
      ],
        'marker': {'colors': ['rgb(175, 49, 35)',
                                  'rgb(177, 180, 34)']},
      "name": "Sex Ratio of Participants",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    }],
     "layout": {
        "title":"Sex Ratio of Participants"
     }
}
iplot(fig, filename='donut')

在这里插入图片描述

5. 金牌数最多的前20个国家

df_medals=athlete_events.loc[athlete_events['Medal']=='Gold']

cnt_srs = df_medals['Team'].value_counts().head(20)

trace = go.Bar(
    x=cnt_srs.index,
    y=cnt_srs.values,
    marker=dict(
        color="blue",
        #colorscale = 'Blues',
        reversescale = True
    ),
)

layout = go.Layout(
    title='Top 20 countries with Maximum Gold Medals'
)

data = [trace]
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename="medal")  

在这里插入图片描述

6. 最受欢迎的运动

cnt_srs = athlete_events['Sport'].value_counts()

trace = go.Bar(
    x=cnt_srs.index,
    y=cnt_srs.values,
    marker=dict(
        color=cnt_srs.values,
        colorscale = 'Picnic',
        reversescale = True
    ),
)

layout = go.Layout(
    title='Most Popular Sport'
)

data = [trace]
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename="sport")

在这里插入图片描述

7. 美国最好的10项体育运动

df_usa=athlete_events.loc[(athlete_events['Team']=='United States')]
df_usa_medal=df_usa.loc[df_usa['Medal']=='Gold']

medal_map = {'Gold':1}
df_usa_medal['Medal'] = df_usa_medal['Medal'].map(medal_map)

df_usa_sport=df_usa_medal.groupby(['Sport'],as_index=False)['Medal'].agg('sum')

df_usa_sport=df_usa_sport.sort_values(['Medal'],ascending=False)

df_usa_sport=df_usa_sport.head(10)

colors = ['#91BBF4', '#91F4F4', '#F79981', '#F7E781', '#C0F781','rgb(32,155,160)', 'rgb(253,93,124)', 'rgb(28,119,139)', 'rgb(182,231,235)', 'rgb(35,154,160)']

n_phase = len(df_usa_sport['Sport'])
plot_width = 200

# height of a section and difference between sections 
section_h = 100
section_d = 10

# multiplication factor to calculate the width of other sections
unit_width = plot_width / max(df_usa_sport['Medal'])

# width of each funnel section relative to the plot width
phase_w = [int(value * unit_width) for value in df_usa_sport['Medal']]

height = section_h * n_phase + section_d * (n_phase - 1)

# list containing all the plot shapes
shapes = []

# list containing the Y-axis location for each section's name and value text
label_y = []

for i in range(n_phase):
        if (i == n_phase-1):
                points = [phase_w[i] / 2, height, phase_w[i] / 2, height - section_h]
        else:
                points = [phase_w[i] / 2, height, phase_w[i+1] / 2, height - section_h]

        path = 'M {0} {1} L {2} {3} L -{2} {3} L -{0} {1} Z'.format(*points)

        shape = {
                'type': 'path',
                'path': path,
                'fillcolor': colors[i],
                'line': {
                    'width': 1,
                    'color': colors[i]
                }
        }
        shapes.append(shape)
        
        # Y-axis location for this section's details (text)
        label_y.append(height - (section_h / 2))

        height = height - (section_h + section_d)
        
label_trace = go.Scatter(
    x=[-200]*n_phase,
    y=label_y,
    mode='text',
    text=df_usa_sport['Sport'],
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)
 
# For phase values
value_trace = go.Scatter(
    x=[-350]*n_phase,
    y=label_y,
    mode='text',
    text=df_usa_sport['Medal'],
    textfont=dict(
        color='rgb(200,200,200)',
        size=12
    )
)

data = [label_trace, value_trace]
 
layout = go.Layout(
    title="<b>Top 10 Sports in which USA is best</b>",
    titlefont=dict(
        size=12,
        color='rgb(203,203,203)'
    ),
    shapes=shapes,
    height=600,
    width=800,
    showlegend=False,
    paper_bgcolor='rgba(44,58,71,1)',
    plot_bgcolor='rgba(44,58,71,1)',
    xaxis=dict(
        showticklabels=False,
        zeroline=False,
    ),
    yaxis=dict(
        showticklabels=False,
        zeroline=False
    )
)
 
fig = go.Figure(data=data, layout=layout)
iplot(fig)

在这里插入图片描述


关注公众号:『AI学习星球
回复:奥运会数据分析 即可获取数据下载。
算法学习4对1辅导论文辅导核心期刊可以通过公众号或?v:codebiubiubiu滴滴我
在这里插入图片描述

文章来源:https://blog.csdn.net/weixin_42363541/article/details/135629390
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。