Assignment 2 - Pandas Introduction 密歇根大学的python数据科学作业

最新推荐文章于 2024-07-19 17:32:48 发布

晴子DAYTOY

最新推荐文章于 2024-07-19 17:32:48 发布

阅读量1.3k

点赞数 1

文章标签： coursera python data science assignment

本文链接：https://blog.csdn.net/haruko666/article/details/92117361

版权

这篇博客详细记录了Coursera上密歇根大学Python数据科学课程的第二份作业，重点介绍了Pandas库的使用。内容涵盖如何处理olympics.csv数据集，包括找到夏季奥运会金牌最多的国家、夏季与冬季金牌差异最大的国家等。同时，还探讨了美国人口普查数据，如确定拥有最多县的州、人口增长最多的县等。

摘要由CSDN通过智能技术生成

这个作业做得我头发都要掉光了，网上的参考答案看起来又好复杂，所以我决定自己记录一下，代码不敢说是最简洁的，但是绝对是最适合新手的，毕竟我也是个小白

Assignment 2 - Pandas Introduction

All questions are weighted the same in this assignment.

Part 1

The following code loads the olympics dataset (olympics.csv), which was derrived from the Wikipedia entry on All Time Olympic Games Medals, and does some basic data cleaning.

The columns are organized as # of Summer games, Summer medals, # of Winter games, Winter medals, total # number of games, total # of medals. Use this dataset to answer the questions below.

import pandas as pd

df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)

for col in df.columns:
    if col[:2]=='01':
        df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
    if col[:2]=='02':
        df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
    if col[:2]=='03':
        df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
    if col[:1]=='№':
        df.rename(columns={col:'#'+col[1:]}, inplace=True)

names_ids = df.index.str.split('\s\(') # split the index by '('

df.index = names_ids.str[0] # the [0] element is the country name (new index) 
df['ID'] = names_ids.str[1].str[:3] # the [1] element is the abbreviation or ID (take first 3 characters from that)

df = df.drop('Totals')
df.head()