Assignment 2 - Pandas Introduction 密歇根大学的python数据科学作业

这篇博客详细记录了Coursera上密歇根大学Python数据科学课程的第二份作业,重点介绍了Pandas库的使用。内容涵盖如何处理olympics.csv数据集,包括找到夏季奥运会金牌最多的国家、夏季与冬季金牌差异最大的国家等。同时,还探讨了美国人口普查数据,如确定拥有最多县的州、人口增长最多的县等。
摘要由CSDN通过智能技术生成


这个作业做得我头发都要掉光了,网上的参考答案看起来又好复杂,所以我决定自己记录一下,代码不敢说是最简洁的,但是绝对是最适合新手的,毕竟我也是个小白

Assignment 2 - Pandas Introduction

All questions are weighted the same in this assignment.

Part 1

The following code loads the olympics dataset (olympics.csv), which was derrived from the Wikipedia entry on All Time Olympic Games Medals, and does some basic data cleaning.

The columns are organized as # of Summer games, Summer medals, # of Winter games, Winter medals, total # number of games, total # of medals. Use this dataset to answer the questions below.

import pandas as pd

df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)

for col in df.columns:
    if col[:2]=='01':
        df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
    if col[:2]=='02':
        df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
    if col[:2]=='03':
        df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
    if col[:1]=='№':
        df.rename(columns={col:'#'+col[1:]}, inplace=True)

names_ids = df.index.str.split('\s\(') # split the index by '('

df.index = names_ids.str[0] # the [0] element is the country name (new index) 
df['ID'] = names_ids.str[1].str[:3] # the [1] element is the abbreviation or ID (take first 3 characters from that)

df = df.drop('Totals')
df.head()

 

<
  # Summer Gold Silver Bronze Total # Winter Gold.1 Silver.1 Bronze.1 Total.1 # Games Gold.2 Silver.2 Bronze.2 Combined total ID
Afghanistan 13 0 0 2 2 0 0 0 0 0 13 0 0 2 2 AFG
Algeria 12 5 2 8 15 3 0 0
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值