Assignment 2 Pandas Introduction

最新推荐文章于 2022-10-19 13:57:50 发布

alisonxPandas

最新推荐文章于 2022-10-19 13:57:50 发布

阅读量2.1k

点赞数 1

分类专栏： python 文章标签： coursera课程笔记

本文链接：https://blog.csdn.net/alisonxPandas/article/details/79824725

版权

整理一下刚做完的coursera专项课程：Introduction to Data Science in Python （密歇根大学）第二周的作业。对我而言这个作业真滴挺难的，小白嘛。最后八道题只做对了六道，只能靠大神的答案才能勉强通过这样子；就算是我做对了的答案，也出现了很多DataFrame的复制，肯定是降低效率的。特此把我的答案和大神的答案放在一起比对~

Part 1

The following code loads the olympics dataset (olympics.csv), which was derrived from the Wikipedia entry on All Time Olympic Games Medals, and does some basic data cleaning.

The columns are organized as # of Summer games, Summer medals, # of Winter games, Winter medals, total # number of games, total # of medals. Use this dataset to answer the questions below.

首先载入数据集：

import pandas as pd
import numpy as np

df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)
for col in df.columns:
if col[:2]=='01':
df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
if col[:2]=='02':
df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
if col[:2]=='03':
df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
if col[:1]=='№':
df.rename(columns={col:'#'+col[1:]}, inplace=True)

names_ids = df.index.str.split('\s\(') # split the index by '('

df.index = names_ids.str[0] # the [0] element is the country name (new index)
df['ID'] = names_ids.str[1].str[:3] # the [1] element is the abbreviation or ID (take first 3 characters from that)


df = df.drop('Totals')
df.head()

数据集大概长这个样子滴

	# Summer	Gold	Silver	Bronze	Total	# Winter	Gold.1	Silver.1	Bronze.1	Total.1	# Games	Gold.2	Silver.2	Bronze.2	Combined total	ID
Afghanistan	13	0	0	2	2	0	0	0	0	0	13	0	0	2	2	AFG
Algeria	12	5	2	8	15	3	0	0	0	0	15	5	2	8	15	ALG
Argentina	23