# [python case]DataCamp中Dr. Semmelweis and the discovery of handwashing案例

#本人数据新手（real - - ），前几天刚刚接触datacamp，感觉还蛮有趣。基本上所有练习都由浅入深，大多数只要能看懂英文大意即可完成。

#接下来如果有时间的话计划整理一些学习体会。

#如果有一起学习datacamp的小伙伴欢迎留言，一起学习。

#title

Dr. Semmelweis and the discovery of handwashing

##summary

Reanalyse the data behind one of the most important discoveries of modern medicine: Handwashing.

##skill

pandas foudations

In 1847 the Hungarian physician Ignaz Semmelweis makes a breakthough discovery: He discovers handwashing. Contaminated hands was a major cause of childbed fever and by enforcing handwashing at his hospital he saved hundreds of lives.

• 1.Meet Dr. Ignaz Semmelweis
• 2.The alarming number of deaths
• 3.Death at the clinics
• 4.The handwashing begins
• 5.The effect of handwashing
• 6.The effect of handwashing highlighted
• 7.More handwashing, fewer deaths?
• 8.A Bootstrap analysis of Semmelweis handwashing data
• 9.The fate of Dr. Semmelweis

### 1. Meet Dr. Ignaz Semmelweis

# importing modules
import pandas as pd #导入pandas 以pd作为简称
import csv #导入csv

# Print out yearly

print(yearly)#输出yearly检查变量



output:

    year  births  deaths    clinic
0   1841    3036     237  clinic 1
1   1842    3287     518  clinic 1
2   1843    3060     274  clinic 1
3   1844    3157     260  clinic 1
4   1845    3492     241  clinic 1
5   1846    4010     459  clinic 1
6   1841    2442      86  clinic 2
7   1842    2659     202  clinic 2
8   1843    2739     164  clinic 2
9   1844    2956      68  clinic 2
10  1845    3241      66  clinic 2
11  1846    3754     105  clinic 2

### 2. The alarming number of deaths

# Calculate proportion of deaths per no. births
yearly["proportion_deaths"]=yearly['deaths']/yearly['births']#增加proportion_deaths死亡率列
# Extract clinic 1 data into yearly1 and clinic 2 data into yearly2
yearly1 = yearly.loc[yearly['clinic']=='clinic 1']#提取含clinic1的行,利用loc函数
yearly2 = yearly.loc[yearly['clinic']=='clinic 2']#提取含clinic2的行
print(yearly1)
# Print out yearly1
# ... YOUR CODE FOR TASK 2 ...

output:

   year  births  deaths    clinic  proportion_deaths
0  1841    3036     237  clinic 1           0.078063
1  1842    3287     518  clinic 1           0.157591
2  1843    3060     274  clinic 1           0.089542
3  1844    3157     260  clinic 1           0.082357
4  1845    3492     241  clinic 1           0.069015
5  1846    4010     459  clinic 1           0.114464

df.loc[df['column_name'] == some_value]

### 3. Death at the clinics

# This makes plots appear in the notebook
%matplotlib inline#magic method

# Plot yearly proportion of deaths at the two clinics

ax = yearly1.plot(x="year", y="proportion_deaths",label="clinic1")#利用plot函数画图,x轴为年,y轴为死亡率,label添加图例,为了yearly1和yearly2同轴(图)显示,将此图名为ax
yearly2.plot(x="year", y="proportion_deaths",label="clinic2", ax=ax)#利用ax=ax可以实现同轴显示
ax.set_ylabel("Proportion deaths")#设置y轴名命令,sex_ylabel("name")

output:

### 4. The handwashing begins

# Read datasets/monthly_deaths.csv into monthly

# Calculate proportion of deaths per no. births
monthly["proportion_deaths"]=monthly["deaths"]/monthly["births"]

# Print out the first rows in monthly
print(monthly.head(3))#输出部分(前三行)monthly数据

### 5. The effect of handwashing

# Plot monthly proportion of deaths

ax=monthly.plot(x="date",y="proportion_deaths",label="deaths after handwashing")
ax.set_ylabel("Proportion deaths")

output:

### 6. The effect of handwashing highlighted

# Date when handwashing was made mandatory
import pandas as pd
handwashing_start = pd.to_datetime('1847-06-01')#标注'洗手事变'开始时间

# Split monthly into before and after handwashing_start
before_washing = monthly.loc[monthly['date']<handwashing_start]#将时间轴划分为洗手前和洗手后(parse_date的作用出现了)
after_washing =  monthly.loc[monthly['date']>=handwashing_start]

# Plot monthly proportion of deaths before and after handwashing
ax=before_washing.plot(x='date',y='proportion_deaths',label='before washing')
after_washing.plot(x='date',y='proportion_deaths',label='after washing',ax=ax)
ax.set_ylabel="Proportion deaths"

output:

### 7. More handwashing, fewer deaths?

# Difference in mean monthly proportion of deaths due to handwashing
before_proportion = before_washing['proportion_deaths']
after_proportion = after_washing['proportion_deaths']
mean_diff = after_proportion.mean()-before_proportion.mean()
mean_diff

output:

-0.08395660751183336

### 8. A Bootstrap analysis of Semmelweis handwashing data

https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29 (too long no see)

# A bootstrap analysis of the reduction of deaths due to handwashing
boot_mean_diff = []#定义一个空list
for i in range(3000):#做一个3000次的实验
boot_before = before_proportion.sample(frac=1,replace=True)#frac=1->全部重新排序,并放回
boot_after = after_proportion.sample(frac=1,replace=True)
boot_mean_diff.append(boot_after.mean() - boot_before.mean())计算一次均值差,加入boot_mean_diff中

# Calculating a 95% confidence interval from boot_mean_diff #计算boot_mean_diff置信区间
confidence_interval = pd.Series(boot_mean_diff).quantile([0.025, 0.975])
confidence_interval


print(boot_mean_diff[0:20])

[-0.07787261202620424, -0.07424799825364967, -0.09358312005955502, -0.08894556209810614, -0.08087685009098905, -0.06429190709356139, -0.08068023440789948, -0.07240951438539092, -0.06750565112365006, -0.0676633601804324, -0.08713968457785505, -0.08382681590118775, -0.08280812612089627, -0.08059191110129257, -0.09227479693648963, -0.07786725171910112, -0.08150749269654012, -0.08903607701866195, -0.061659787819670464, -0.0809971940784796]

test：

import pandas as pd
b=a['births']
print(b)
print('---------')
print(b.sample(frac=1,replace=True))

output：

0    254
1    239
2    277
3    255
Name: births, dtype: int64
---------
0    254
1    239
1    239
2    277
Name: births, dtype: int64

### 9. The fate of Dr. Semmelweis

So handwashing reduced the proportion of deaths by between 6.7 and 10 percentage points, according to a 95% confidence interval. All in all, it would seem that Semmelweis had solid evidence that handwashing was a simple but highly effective procedure that could save many lives.

The tragedy is that, despite the evidence, Semmelweis' theory — that childbed fever was caused by some "substance" (what we today know as bacteria) from autopsy room corpses — was ridiculed by contemporary scientists. The medical community largely rejected his discovery and in 1849 he was forced to leave the Vienna General Hospital for good.

One reason for this was that statistics and statistical arguments were uncommon in medical science in the 1800s. Semmelweis only published his data as long tables of raw data, but he didn't show any graphs nor confidence intervals. If he would have had access to the analysis we've just put together he might have been more successful in getting the Viennese doctors to wash their hands.

# The data Semmelweis collected points to that:
doctors_should_wash_their_hands = True

that's all thank you~~~

• 0
点赞
• 0
收藏
觉得还不错? 一键收藏
• 3
评论
02-15 6万+
01-30 6876
12-02 1万+
01-09 4947
10-10 5587
05-23 1165
09-13 412
09-14 445
09-12 1152

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。