python公开课第三天

找出诺贝尔得奖者最多的国家

1.1: 读取数据并进行观察
1.2: 统计得奖国家的个数
1.3: 探索得奖最多的国家崛起的时间

1.1: 读取数据并进行观察

import pandas as pd

import numpy as np

read_csv(路径名/文件名)

nobel = pd.read_csv(‘nobel.csv’)

观察头几行 数据集.head()

nobel.head() # 头五行

year 	category 	prize 	motivation 	prize_share 	laureate_id 	laureate_type 	full_name 	birth_date 	birth_city 	birth_country 	sex 	organization_name 	organization_city 	organization_country 	death_date 	death_city 	death_country

0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services … 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Male Berlin University Berlin Germany 1911-03-01 Berlin Germany
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit… 1/1 569 Individual Sully Prudhomme 1839-03-16 Paris France Male NaN NaN NaN 1907-09-07 Châtenay France
2 1901 Medicine The Nobel Prize in Physiology or Medicine 1901 "for his work on serum therapy, especially its… 1/1 293 Individual Emil Adolf von Behring 1854-03-15 Hansdorf (Lawice) Prussia (Poland) Male Marburg University Marburg Germany 1917-03-31 Marburg Germany
3 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 462 Individual Jean Henry Dunant 1828-05-08 Geneva Switzerland Male NaN NaN NaN 1910-10-30 Heiden Switzerland
4 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 463 Individual Frédéric Passy 1822-05-20 Paris France Male NaN NaN NaN 1912-06-12 Paris France

1.2: 统计得奖国家的个数

查询出生国家 『birth_country』再看看哪个国家的得奖者多

统计国家的个数 数据集.value_counts()

head(你想看几行)

nobel[‘birth_country’].value_counts().head(10)

United States of America 259
United Kingdom 85
Germany 61
France 51
Sweden 29
Japan 24
Netherlands 18
Canada 18
Russia 17
Italy 17
Name: birth_country, dtype: int64

1.3: 探索得奖最多的国家崛起的时间 (美国)

提取所有来自美国的得奖者

nobel[“usa_winner”] = nobel[‘birth_country’] == “United States of America”

nobel[“usa_winner”]

提取时间 已每十年为一个单位 decade 世代

(np.floor(年份 /10)) * 10

nobel[“decade”] = np.floor(nobel[‘year’] /10) * 10

nobel[“decade”]

计算占有多少比例 数据集.groupby(“建立群组”)[“要查询的对象”].要做什么

prop_usa_winner = nobel.groupby(“decade”, as_index= False)[“usa_winner”].mean()

prop_usa_winner

可视化

import matplotlib.pyplot as plt

import seaborn as sb

plt.plot(nobel[“decade”], nobel[“usa_winner”] )

plt.xlabel(“AAAA”)

plt.ylabel(“BBB”)

plt.show()

plt.rcParams[‘figure.figsize’] = [11, 7]

#折线图 linplot( x = X轴的数据, y = Y轴的数据)

sb.lineplot(x= nobel[“decade”] , y= nobel[“usa_winner”])

plt.show()

/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

找出诺贝尔奖中女性的比例

2.1: 取出所有女性诺贝尔奖得主
2.2: 计算所占的比例
2.3: 第一位女性诺贝尔奖得主在哪一年得奖?

2.1: 取出所有女性诺贝尔奖得主

nobel[“sex”].value_counts()

Male 836
Female 49
Name: sex, dtype: int64

2.2: 计算所占的比例

提取所有的女性得主

nobel[“female_winner”] = nobel[“sex”] == “Female”

以十年为一个单位

nobel[“decade”]

进行每个十年之间有多少女性得主 groupby(“要建立的群组”)[“要查询的对象”].操作

prop_female_winner = nobel.groupby(“decade”, as_index= False)[“female_winner”].mean()

可视化 折线图

sb.lineplot(x = “decade”, y= “female_winner”, data= prop_female_winner)

<matplotlib.axes._subplots.AxesSubplot at 0x1228fde48>

2.3: 第一位女性诺贝尔奖得主在哪一年得奖?

female_winner = nobel[ nobel[“sex”] == “Female”]

female_winner.nsmallest(1,‘year’)

female_winner.min()

year 	category 	prize 	motivation 	prize_share 	laureate_id 	laureate_type 	full_name 	birth_date 	birth_city 	... 	sex 	organization_name 	organization_city 	organization_country 	death_date 	death_city 	death_country 	usa_winner 	decade 	female_winner

19 1903 Physics The Nobel Prize in Physics 1903 "in recognition of the extraordinary services … 1/4 6 Individual Marie Curie, née Sklodowska 1867-11-07 Warsaw … Female NaN NaN NaN 1934-07-04 Sallanches France False 1900.0 True

1 rows × 21 columns
找出诺贝尔奖得奖者的平均年龄

3.1: 计算所有得奖者得奖的年纪
3.2: 可视化结果
3.3: 所有的奖项得奖者的年纪

3.1: 计算所有得奖者得奖的年纪 哪一年得奖 - 出生时间 = 获奖年纪

出生日期的格式 需要转换

to_datetime(要转换的数据放进来)

.dt.year 取出年 的部份

nobel[‘birth_date’] = pd.to_datetime(nobel[‘birth_date’])

nobel[“age”] = nobel[“year”] - nobel[‘birth_date’].dt.year

nobel[“age”]

year 	category 	prize 	motivation 	prize_share 	laureate_id 	laureate_type 	full_name 	birth_date 	birth_city 	... 	organization_name 	organization_city 	organization_country 	death_date 	death_city 	death_country 	usa_winner 	decade 	female_winner 	age

0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services … 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam … Berlin University Berlin Germany 1911-03-01 Berlin Germany False 1900.0 False 49.0
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit… 1/1 569 Individual Sully Prudhomme 1839-03-16 Paris … NaN NaN NaN 1907-09-07 Châtenay France False 1900.0 False 62.0
2 1901 Medicine The Nobel Prize in Physiology or Medicine 1901 "for his work on serum therapy, especially its… 1/1 293 Individual Emil Adolf von Behring 1854-03-15 Hansdorf (Lawice) … Marburg University Marburg Germany 1917-03-31 Marburg Germany False 1900.0 False 47.0
3 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 462 Individual Jean Henry Dunant 1828-05-08 Geneva … NaN NaN NaN 1910-10-30 Heiden Switzerland False 1900.0 False 73.0
4 1901 Peace The Nobel Peace Prize 1901 NaN 1/2 463 Individual Frédéric Passy 1822-05-20 Paris … NaN NaN NaN 1912-06-12 Paris France False 1900.0 False 79.0

5 rows × 22 columns

3.2: 可视化结果

sb.lmplot(x = “year”, y=“age”, data = nobel, aspect = 宽度, line_kws = {color : 颜色 }, lowess = True)

sb.lmplot(x = “year”, y=“age”, data = nobel, aspect= 2, line_kws= {“color” : “black”}, lowess=True)

<seaborn.axisgrid.FacetGrid at 0x1a27f51f60>

3.3: 所有的奖项得奖者的年纪

sb.lmplot(x = “year”,

      y="age", 

      data = nobel, 

      aspect= 2, 

      line_kws= {"color" : "black"}, 

      lowess=True,

      row = "category"

     )
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值