奥运会数据集分析(部分)

数据科学应用案例实践报告

小组成员:XXX

主要方法:采用pandas 进行数据处理,采用Pyecharts 进行绘图

摘要: 针对奥运会2020夏季奥运会的相关分析,利用了python里面的pandas和pyecharts等相关的库,实现了数据清洗,数据挖掘,以及数据可视化,将奥运会的每日金牌数和奥运会的相关数据进行了,整理,对数据进行了相关预测。将数据预测与相关变化以数据图表的方式展示出来,更加易于理解。

关键词:奥运会,python,pandas,pyecharts……

Abstract: For the relevant analysis of the Olympic Games in the 2020 Summer Olympics, the relevant libraries such as pandas and pyecharts in python are used to realize data cleaning, data mining, and data visualization. The daily gold medals of the Olympic Games and the relevant data of the Olympic Games are organized and organized. , Made relevant predictions on the data. The data forecasts and related changes are displayed in the form of data charts, which is easier to understand.
Keywords: Olympic Games, python, pandas, pyecharts…

一. 背景:

2020奥运会结束后,对奥运会数据进行数据分析,通过将数据可视化展示出我们奥运会的金牌榜与奥运会的变化,以便于我们可以充分的了解奥运会。

二. 进行数据分析的流程

1. 导入模块

如果缺少库,请输入pip install -r requirements.txt进行安装

!pip install --upgrade pyecharts

Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pyecharts in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (1.9.1)
Requirement already satisfied: jinja2 in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from pyecharts) (3.0.3)
Requirement already satisfied: prettytable in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from pyecharts) (2.4.0)
Requirement already satisfied: simplejson in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from pyecharts) (3.17.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from jinja2->pyecharts) (2.0.1)
Requirement already satisfied: wcwidth in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from prettytable->pyecharts) (0.2.5)
import pandas as pd
from pyecharts.charts import Timeline, Line, Tree
from pyecharts import options as opts
from pyecharts.commons.utils import JsCode

2. Pandas数据处理

2.1 读取数据

df = pd.read_csv('../others/2020东京奥运会奖牌数据.csv', index_col=0, encoding = 'gb18030')
df.head(20)
国家国家编码金牌银牌铜牌总计国旗
日期
2021-07-24中国CHN3014https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-24意大利ITA1102https://www.sinaimg.cn/ty/2020/Olympic/flag/IT...
2021-07-24日本JPN1102https://www.sinaimg.cn/ty/2020/Olympic/flag/JP...
2021-07-24韩国KOR1023https://www.sinaimg.cn/ty/2020/Olympic/flag/KO...
2021-07-24厄瓜多尔ECU1001https://www.sinaimg.cn/ty/2020/Olympic/flag/EC...
2021-07-24匈牙利HUN1001https://www.sinaimg.cn/ty/2020/Olympic/flag/HU...
2021-07-24伊朗IRI1001https://www.sinaimg.cn/ty/2020/Olympic/flag/IR...
2021-07-24科索沃KOS1001https://www.sinaimg.cn/ty/2020/Olympic/flag/KO...
2021-07-24泰国THA1001https://www.sinaimg.cn/ty/2020/Olympic/flag/TH...
2021-07-24ROCROC0112https://www.sinaimg.cn/ty/2020/Olympic/flag/RO...
2021-07-24塞尔维亚SRB0112https://www.sinaimg.cn/ty/2020/Olympic/flag/SR...
2021-07-24比利时BEL0101https://www.sinaimg.cn/ty/2020/Olympic/flag/BE...
2021-07-24西班牙ESP0101https://www.sinaimg.cn/ty/2020/Olympic/flag/ES...
2021-07-24印度IND0101https://www.sinaimg.cn/ty/2020/Olympic/flag/IN...
2021-07-24荷兰NED0101https://www.sinaimg.cn/ty/2020/Olympic/flag/NE...
2021-07-24罗马尼亚ROU0101https://www.sinaimg.cn/ty/2020/Olympic/flag/RO...
2021-07-24中国台北TPE0101https://www.sinaimg.cn/ty/2020/Olympic/flag/TP...
2021-07-24突尼斯TUN0101https://www.sinaimg.cn/ty/2020/Olympic/flag/TU...
2021-07-24爱沙尼亚EST0011https://www.sinaimg.cn/ty/2020/Olympic/flag/ES...
2021-07-24法国FRA0011https://www.sinaimg.cn/ty/2020/Olympic/flag/FR...

将csv中的数据导入到我们的项目

2.2 是否有缺失值

df.isnull().any()
国家      False
国家编码    False
金牌      False
银牌      False
铜牌      False
总计      False
国旗      False
dtype: bool

各列数据均不存在缺失情况。

2.3 查看中国每日数据

df1 = df[df['国家']=='中国']
df1
国家国家编码金牌银牌铜牌总计国旗
日期
2021-07-24中国CHN3014https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-25中国CHN3137https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-26中国CHN0437https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-27中国CHN3003https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-28中国CHN3126https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-29中国CHN3104https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-30中国CHN4329https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-31中国CHN2305https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-01中国CHN3115https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-02中国CHN53311https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-03中国CHN3407https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-04中国CHN0101https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-05中国CHN2204https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-06中国CHN2215https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-07中国CHN1102https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...

2.4 统计中国、美国、日本、澳大利亚4个国家数据

all_country_data = []
flg = {}
cols = ['国家']
countrys = ['中国','美国','日本','澳大利亚']
for country in countrys:
    df1 = df[df['国家']==country]
    df_t = df1.copy()
    df2 = df.loc[df['国家']==country,['金牌','银牌','铜牌','总计']]
    if len(df2.index.tolist()) >= len(cols):
        cols += df2.index.tolist()
    flg[country] = df1.iloc[:1, -1].values[0]
    
    one_country_data = [country]
    datasss = []
    for i in range(df2.shape[0]):    
        datasss.append(df2[:i+1].apply(lambda x:x.sum()).values.tolist())
    d1 = pd.DataFrame(data=datasss, columns=['金牌','银牌','铜牌','总计'])
    for col in d1.columns:
        df_t[col] = d1[col].values
    df_t1 = df_t.loc[:,['金牌']]
    one_country_data += df_t['金牌'].values.tolist()
    all_country_data.append(one_country_data)
all_country_data
[['中国', 3, 6, 6, 9, 12, 15, 19, 21, 24, 29, 32, 32, 34, 36, 37],
 ['美国', 4, 7, 9, 11, 14, 14, 16, 20, 22, 24, 25, 29, 31, 31],
 ['日本', 1, 5, 8, 10, 13, 15, 17, 18, 18, 18, 19, 21, 22, 24],
 ['澳大利亚', 1, 2, 3, 6, 8, 9, 10, 14, 14, 15, 17, 17]]

dataFrame更新

d2 = pd.DataFrame(data=all_country_data,columns=cols)
d2 = d2.fillna(method = 'ffill',axis=1)
d2
国家2021-07-242021-07-252021-07-262021-07-272021-07-282021-07-292021-07-302021-07-312021-08-012021-08-022021-08-032021-08-042021-08-052021-08-062021-08-07
0中国3669121519212429323234.036.037.0
1美国47911141416202224252931.031.031.0
2日本15810131517181818192122.024.024.0
3澳大利亚123689101414151717171717

可根据需要获取多个国家数据,改变countrys列表即可。

3. Pyecharts绘图

3.1 绘制基础折线图

CHN = []
x_data=cols[1:]
for d_time in cols[1:]:
    CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])
l1 = (
    Line()
    .add_xaxis(x_data)
    # 中国线条
    .add_yaxis(
        '中国',
        CHN,
        label_opts=opts.LabelOpts(is_show=True))
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title='中国金牌',
            pos_left='center',
        ),
        yaxis_opts=opts.AxisOpts(
            name='金牌/枚',            
            is_scale=True,
            max_=40),
        legend_opts=opts.LegendOpts(is_show=False),
    ))
l1.render_notebook()  
    <div id="df8cdd80eb3c45b8804f3c6d90582c11" style="width:900px; height:500px;"></div>

3.2 加载样式

# 背景色
background_color_js = (
    "new echarts.graphic.LinearGradient(0, 0, 0, 1, "
    "[{offset: 0, color: '#d9d9d9'}, {offset: 1, color: '#ffd966'}], false)"
)

# 线条样式
linestyle_dic = { 'normal': {
                    'width': 4,  
                    'shadowColor': '#696969', 
                    'shadowBlur': 10,  
                    'shadowOffsetY': 10,  
                    'shadowOffsetX': 10,  
                    }
                }
    
timeline = Timeline(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),
                                            width='980px',height='600px'))
timeline.add_schema(is_auto_play=True, is_loop_play=True, 
                    is_timeline_show=True, play_interval=500)

CHN = []
x_data=cols[1:]
for d_time in cols[1:]:
    CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])
line = (
    Line(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),
                                 width='980px',height='600px'))
    .add_xaxis(x_data)
    # 中国线条
    .add_yaxis(
        '中国',
        CHN,
        symbol_size=10,
        is_smooth=True,
        label_opts=opts.LabelOpts(is_show=True),
        markpoint_opts=opts.MarkPointOpts(
                data=[  opts.MarkPointItem(
                        name="",
                        type_='max',
                        value_index=0,
                        symbol='image://'+ flg['中国'],
                        symbol_size=[40, 25],
                    )],
                label_opts=opts.LabelOpts(is_show=False),
            )
    )
    .set_series_opts(linestyle_opts=linestyle_dic,label_opts=opts.LabelOpts(font_size=12, color='red' ))
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title='中国金牌',
            pos_left='center',
            pos_top='2%',
            title_textstyle_opts=opts.TextStyleOpts(
                    color='#DC143C', font_size=20)
        ),
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=14, color='red'),
                                 axisline_opts=opts.AxisLineOpts(is_show=True,
                                    linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),
        yaxis_opts=opts.AxisOpts(
            name='金牌/枚',            
            is_scale=True,
            max_=40,
            name_textstyle_opts=opts.TextStyleOpts(font_size=16,font_weight='bold',color='#FFD700'),
            axislabel_opts=opts.LabelOpts(font_size=13,color='red'),
            splitline_opts=opts.SplitLineOpts(is_show=True, 
                                              linestyle_opts=opts.LineStyleOpts(type_='dashed')),
            axisline_opts=opts.AxisLineOpts(is_show=True,
                                    linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))
        ),
        legend_opts=opts.LegendOpts(is_show=False, pos_right='1.5%', pos_top='2%',
                                    legend_icon='roundRect',orient = 'horizontal'),
    ))
line.render_notebook()
    <div id="dc3037b44d38492aa44c5ed8e10d86c7" style="width:980px; height:600px;"></div>

3.3 动态展示中国每日金牌数据

# 背景色
background_color_js = (
    "new echarts.graphic.LinearGradient(0, 0, 0, 1, "
    "[{offset: 0, color: '#d9d9d9'}, {offset: 1, color: '#ffd966'}], false)"
)

# 线条样式
linestyle_dic = { 'normal': {
                    'width': 4,  
                    'shadowColor': '#696969', 
                    'shadowBlur': 10,  
                    'shadowOffsetY': 10,  
                    'shadowOffsetX': 10,  
                    }
                }
    
timeline = Timeline(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),
                                            width='980px',height='600px'))
timeline.add_schema(is_auto_play=True, is_loop_play=True, 
                    is_timeline_show=True, play_interval=500)

CHN = []
x_data=cols[1:]
for d_time in cols[1:]:
    CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])
    line = (
        Line(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),
                                     width='980px',height='600px'))
        .add_xaxis(x_data)
        # 中国线条
        .add_yaxis(
            '中国',
            CHN,
            symbol_size=10,
            is_smooth=True,
            label_opts=opts.LabelOpts(is_show=True),
            markpoint_opts=opts.MarkPointOpts(
                    data=[  opts.MarkPointItem(
                            name="",
                            type_='max',
                            value_index=0,
                            symbol='image://'+ flg['中国'],
                            symbol_size=[40, 25],
                        )],
                    label_opts=opts.LabelOpts(is_show=False),
                )
        )
        .set_series_opts(linestyle_opts=linestyle_dic,label_opts=opts.LabelOpts(font_size=12, color='red' ))
        .set_global_opts(
            title_opts=opts.TitleOpts(
                title='中国金牌',
                pos_left='center',
                pos_top='2%',
                title_textstyle_opts=opts.TextStyleOpts(color='#DC143C', font_size=20)),
            xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=14, color='red'),
                         axisline_opts=opts.AxisLineOpts(is_show=True,
                            linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),
            yaxis_opts=opts.AxisOpts(
                name='金牌/枚',            
                is_scale=True,
                max_=40,
                name_textstyle_opts=opts.TextStyleOpts(font_size=16,font_weight='bold',color='#FFD700'),
                axislabel_opts=opts.LabelOpts(font_size=13,color='red',rotate=15),
                splitline_opts=opts.SplitLineOpts(is_show=True, 
                                                  linestyle_opts=opts.LineStyleOpts(type_='dashed')),
                axisline_opts=opts.AxisLineOpts(is_show=True,
                                        linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))
            ),
            legend_opts=opts.LegendOpts(is_show=True, pos_right='1%', pos_top='2%',
                                        legend_icon='roundRect',orient = 'vertical'),
        ))
    timeline.add(line, '{}'.format(d_time))

timeline.render_notebook()
    <div id="ee4506559c1742cba49e952ffd6ac889" style="width:980px; height:600px;"></div>

3.4 增加其他国家每日金牌数据

# 背景色
background_color_js = (
    "new echarts.graphic.LinearGradient(0, 0, 0, 1, "
    "[{offset: 0, color: '#d9d9d9'}, {offset: 1, color: '#ffd966'}], false)"
)

# 线条样式
linestyle_dic = { 'normal': {
                    'width': 4,  
                    'shadowColor': '#696969', 
                    'shadowBlur': 10,  
                    'shadowOffsetY': 10,  
                    'shadowOffsetX': 10,  
                    }
                }
    
timeline = Timeline(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),
                                            width='980px',height='600px'))
timeline.add_schema(is_auto_play=True, is_loop_play=True, 
                    is_timeline_show=True, play_interval=500)

CHN, USA, JPN, AUS = [], [], [], []
x_data=cols[1:]
for d_time in cols[1:]:
    CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])
    USA.append(d2[d_time][d2['国家']=='美国'].values.tolist()[0])
    JPN.append(d2[d_time][d2['国家']=='日本'].values.tolist()[0])
    AUS.append(d2[d_time][d2['国家']=='澳大利亚'].values.tolist()[0])
    line = (
        Line(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),
                                     width='980px',height='600px'))
        .add_xaxis(x_data)
        # 中国线条
        .add_yaxis(
            '中国',
            CHN,
            symbol_size=10,
            is_smooth=True,
            label_opts=opts.LabelOpts(is_show=True),
            markpoint_opts=opts.MarkPointOpts(
                    data=[  opts.MarkPointItem(
                            name="",
                            type_='max',
                            value_index=0,
                            symbol='image://'+ flg['中国'],
                            symbol_size=[40, 25],
                        )],
                    label_opts=opts.LabelOpts(is_show=False),
                )
        )
        # 美国线条
        .add_yaxis(
            '美国',
            USA,
            symbol_size=5,
            is_smooth=True,
            label_opts=opts.LabelOpts(is_show=True),
            markpoint_opts=opts.MarkPointOpts(
                    data=[
                        opts.MarkPointItem(
                            name="",
                            type_='max',
                            value_index=0,
                            symbol='image://'+ flg['美国'],
                            symbol_size=[40, 25],
                        )
                    ],
                    label_opts=opts.LabelOpts(is_show=False),
                )
        )
        # 日本线条
        .add_yaxis(
            '日本',
            JPN,
            symbol_size=5,
            is_smooth=True,
            label_opts=opts.LabelOpts(is_show=True),
            markpoint_opts=opts.MarkPointOpts(
                    data=[  opts.MarkPointItem(
                            name="",
                            type_='max',
                            value_index=0,
                            symbol='image://'+ flg['日本'],
                            symbol_size=[40, 25],
                        )],
                    label_opts=opts.LabelOpts(is_show=False),
                )
        )
        # 澳大利亚线条
        .add_yaxis(
            '澳大利亚',
            AUS,
            symbol_size=5,
            is_smooth=True,
            label_opts=opts.LabelOpts(is_show=True),
            markpoint_opts=opts.MarkPointOpts(
                    data=[  opts.MarkPointItem(
                            name="",
                            type_='max',
                            value_index=0,
                            symbol='image://'+ flg['澳大利亚'],
                            symbol_size=[40, 25],
                        )],
                    label_opts=opts.LabelOpts(is_show=False),
                )
        )
        .set_series_opts(linestyle_opts=linestyle_dic)
        .set_global_opts(
            title_opts=opts.TitleOpts(
                title='中国 VS 美国 VS 日本 VS 澳大利亚',
                pos_left='center',
                pos_top='2%',
                title_textstyle_opts=opts.TextStyleOpts(
                        color='#DC143C', font_size=20)
            ),
            xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=14, color='red'),
                         axisline_opts=opts.AxisLineOpts(is_show=True,
                            linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),
            yaxis_opts=opts.AxisOpts(
                name='金牌/枚',            
                is_scale=True,
                max_=40,
                name_textstyle_opts=opts.TextStyleOpts(font_size=16,font_weight='bold',color='#FFD700'),
                axislabel_opts=opts.LabelOpts(font_size=13,color='red',rotate=15),
                splitline_opts=opts.SplitLineOpts(is_show=True, 
                                                  linestyle_opts=opts.LineStyleOpts(type_='dashed')),
                axisline_opts=opts.AxisLineOpts(is_show=True,
                                        linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))
            ),
            legend_opts=opts.LegendOpts(is_show=True, pos_right='1%', pos_top='2%',
                                        legend_icon='roundRect',orient = 'vertical'),
        ))
    timeline.add(line, '{}'.format(d_time))
timeline.render_notebook()
    <div id="e831c9c3c8564956b748d3f05df4b186" style="width:980px; height:600px;"></div>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值