【深耕 Python】Data Science with Python 数据科学(12)pandas 数据处理(三)

写在前面

关于数据科学环境的建立,可以参考我的博客:

【深耕 Python】Data Science with Python 数据科学(1)环境搭建

往期数据科学博文:

【深耕 Python】Data Science with Python 数据科学(2)jupyter-lab和numpy数组

【深耕 Python】Data Science with Python 数据科学(3)Numpy 常量、函数和线性空间

【深耕 Python】Data Science with Python 数据科学(4)(书337页)练习题及解答

【深耕 Python】Data Science with Python 数据科学(5)Matplotlib可视化(1)

【深耕 Python】Data Science with Python 数据科学(6)Matplotlib可视化(2)

【深耕 Python】Data Science with Python 数据科学(7)书352页练习题

【深耕 Python】Data Science with Python 数据科学(8)pandas数据结构:Series和DataFrame

【深耕 Python】Data Science with Python 数据科学(9)书361页练习题

【深耕 Python】Data Science with Python 数据科学(10)pandas 数据处理(一)

【深耕 Python】Data Science with Python 数据科学(11)pandas 数据处理(二)

代码说明: 由于实机运行的原因,可能省略了某些导入(import)语句。

本期,继续对诺奖获得者(laureates.csv)进行分析。

Python Code Snippet 1

通过出生日期(born)字段查找数据项。

print(nobel.loc[nobel["born"] == "1879-03-14"])
print(nobel.loc[nobel["born"] == "1879-03-14"]["surname"])
print(nobel.loc[nobel["born"].str.contains("06-28", na=False)])
print(nobel.loc[(nobel["born"].astype('string').str.contains("06-28")) & (nobel["category"] == "physics")])
print(nobel.iloc[79])
# 爱因斯坦的获奖信息
    id firstname   surname        born        died bornCountry  \
25  26    Albert  Einstein  1879-03-14  1955-04-18     Germany   

   bornCountryCode bornCity diedCountry diedCountryCode      diedCity gender  \
25              DE      Ulm         USA              US  Princeton NJ   male   

    year category overallMotivation  share  \
25  1921  physics               NaN      1   

                                           motivation  \
25  "for his services to Theoretical Physics and e...   

                                                 name    city  country  
25  Kaiser-Wilhelm-Institut (now Max-Planck-Instit...  Berlin  Germany

# 仅输出姓氏字段的值  
25    Einstein
Name: surname, dtype: object

# 输出出生日期为6月28日的获奖者信息
      id    firstname         surname        born        died  \
79    79        Maria  Goeppert Mayer  1906-06-28  1972-02-20   
125  126        Klaus    von Klitzing  1943-06-28  0000-00-00   
281  283  F. Sherwood         Rowland  1927-06-28  2012-03-10   
304  306       Alexis          Carrel  1873-06-28  1944-11-05   
598  607        Luigi      Pirandello  1867-06-28  1936-12-10   
790  809     Muhammad           Yunus  1940-06-28  0000-00-00   
889  916   William C.        Campbell  1930-06-28  0000-00-00   

                             bornCountry bornCountryCode  \
79                  Germany (now Poland)              PL   
125  German-occupied Poland (now Poland)              PL   
281                                  USA              US   
304                               France              FR   
598                                Italy              IT   
790       British India (now Bangladesh)              BD   
889                              Ireland              IE   

                     bornCity diedCountry diedCountryCode           diedCity  \
79   Kattowitz (now Katowice)         USA              US       San Diego CA   
125                   Schroda         NaN             NaN                NaN   
281               Delaware OH         USA              US  Corona del Mar CA   
304       Sainte-Foy-lès-Lyon      France              FR              Paris   
598          Agrigento Sicily       Italy              IT               Rome   
790                Chittagong         NaN             NaN                NaN   
889                  Ramelton         NaN             NaN                NaN   

     gender  year    category overallMotivation  share  \
79   female  1963     physics               NaN      4   
125    male  1985     physics               NaN      1   
281    male  1995   chemistry               NaN      3   
304    male  1912    medicine               NaN      1   
598    male  1934  literature               NaN      1   
790    male  2006       peace               NaN      2   
889    male  2015    medicine               NaN      4   

                                            motivation  \
79   "for their discoveries concerning nuclear shel...   
125   "for the discovery of the quantized Hall effect"   
281  "for their work in atmospheric chemistry parti...   
304  "in recognition of his work on vascular suture...   
598  "for his bold and ingenious revival of dramati...   
790  "for their efforts to create economic and soci...   
889  "for their discoveries concerning a novel ther...   

                                            name          city  country  
79                      University of California  San Diego CA      USA  
125  Max-Planck-Institut für Festkörperforschung     Stuttgart  Germany  
281                     University of California     Irvine CA      USA  
304   Rockefeller Institute for Medical Research   New York NY      USA  
598                                          NaN           NaN      NaN  
790                                          NaN           NaN      NaN  
889                              Drew University    Madison NJ      USA

# 输出出生日期为6月28日,且获得物理学奖的获奖者信息  
      id firstname         surname        born        died  \
79    79     Maria  Goeppert Mayer  1906-06-28  1972-02-20   
125  126     Klaus    von Klitzing  1943-06-28  0000-00-00   

                             bornCountry bornCountryCode  \
79                  Germany (now Poland)              PL   
125  German-occupied Poland (now Poland)              PL   

                     bornCity diedCountry diedCountryCode      diedCity  \
79   Kattowitz (now Katowice)         USA              US  San Diego CA   
125                   Schroda         NaN             NaN           NaN   

     gender  year category overallMotivation  share  \
79   female  1963  physics               NaN      4   
125    male  1985  physics               NaN      1   

                                            motivation  \
79   "for their discoveries concerning nuclear shel...   
125   "for the discovery of the quantized Hall effect"   

                                            name          city  country  
79                      University of California  San Diego CA      USA  
125  Max-Planck-Institut für Festkörperforschung     Stuttgart  Germany

# 通过iloc (index location) 方法输出条目79的获奖者信息  
id                                                                  79
firstname                                                        Maria
surname                                                 Goeppert Mayer
born                                                        1906-06-28
died                                                        1972-02-20
bornCountry                                       Germany (now Poland)
bornCountryCode                                                     PL
bornCity                                      Kattowitz (now Katowice)
diedCountry                                                        USA
diedCountryCode                                                     US
diedCity                                                  San Diego CA
gender                                                          female
year                                                              1963
category                                                       physics
overallMotivation                                                  NaN
share                                                                4
motivation           "for their discoveries concerning nuclear shel...
name                                          University of California
city                                                      San Diego CA
country                                                            USA
Name: 79, dtype: object

Python Code Snippet 2

获得诺奖得主的出生、逝世日期,计算诺奖得主的寿命(以天计),并换算为年Y

import numpy as np

bethe = nobel.loc[nobel["surname"] == "Bethe"]
print(bethe["born"])
print(bethe["died"])
diff = pd.to_datetime(bethe["died"]) - pd.to_datetime(bethe["born"])
print(diff)
print(diff.dt.days)
print(diff/np.timedelta64(1, "Y"))
# 汉斯·贝特 Hans Bethe 的出生日期
88    1906-07-02
Name: born, dtype: object

# 汉斯·贝特 Hans Bethe 的逝世日期
88    2005-03-06
Name: died, dtype: object

# 汉斯·贝特 Hans Bethe 的寿命(以天计算)
88   36042 days
dtype: timedelta64[ns]

# 另一种方法访问上述值
88    36042
dtype: int64

# 使用timedelta64()方法计算年数
88    98.679644
dtype: float64

Python Code Snippet 3

组织(非个人)的获奖信息

print(nobel.loc[nobel["born"] == "1873-00-00"])
print(nobel.iloc[465].born)
print(nobel.iloc[465].category)
print(nobel.iloc[465].year)

# 国际法组织,成立于1873年,由于不是个人,月份和日期用00-00表示
      id                       firstname surname        born        died  \
465  467  Institute of International Law     NaN  1873-00-00  0000-00-00   

    bornCountry bornCountryCode bornCity diedCountry diedCountryCode diedCity  \
465         NaN             NaN      NaN         NaN             NaN      NaN   

    gender  year category overallMotivation  share  \
465    org  1904    peace               NaN      1   

                                            motivation name city country  
465  "for its striving in public law to develop pea...  NaN  NaN     NaN

# 该组织成立于1873年 
1873-00-00

# 获得和平奖
peace

# 获奖年份
1904

Python Code Snippet 4

通过出生(born)和逝世(died)字段计算诺奖得主的寿命(在原文件中新建一个lifespan字段)。

nobel["born"] = pd.to_datetime(nobel["born"], errors="coerce")
nobel["died"] = pd.to_datetime(nobel["died"], errors="coerce")
print(nobel.iloc[465].born)
nobel["lifespan"] = (nobel["died"] - nobel["born"]) / np.timedelta64(1, "Y")
bethe = nobel.loc[nobel["surname"] == "Bethe"]
print(bethe["lifespan"])
# 将不合规的“出生”日期转化为NaT (Not a Time) 值。
NaT

# 输出Hans Bethe的寿命值(以年Y计)
88    98.679644
Name: lifespan, dtype: float64

Python Code Snippet 5

通过上一步得到的lifespan字段绘制诺奖得主的寿命直方图。因此,想要长寿,最好的方法是得诺奖。

import matplotlib.pyplot as plt

nobel.hist(column = "lifespan")
plt.show()

在这里插入图片描述

参考文献 Reference

《Learn Enough Python to be Dangerous——Software Development, Flask Web Apps, and Beginning Data Science with Python》, Michael Hartl, Boston, Pearson, 2023.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

不是AI

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值