python填充空值_如何使用与其他两列匹配的python在数据集中填充空值?

我有一个泰坦尼克号的数据集.它具有属性,我在努力工作

1.Age

2.Embark(从那里出发的港口乘客.共有3个港口:S,Q和C)

3.Survived(0表示没有幸存,1表示没有幸存)

我正在过滤无用的数据.然后,我需要填写Age中存在的Null值.因此,我计算了每个登机航班中幸存和未幸存的乘客数量,即S,Q和C

我找出从每个S,Q和C港口出发后幸存和未幸存的乘客的平均年龄.但是现在我不知道如何在原始的《泰坦尼克号》年龄列中填充这6个值(对于每个S,Q和C来说是3个,对于每个S,Q和C来说都没有幸存的3个……总共6个) .如果我只是简单地执行titanic.Age.fillna(‘With六个值之一’),它将使用我不需要的那个值填充Age的所有Null值.

给了一些时间后,我尝试了一下.

titanic[titanic.Survived==1][titanic.Embarked=='S'].Age.fillna(SurvivedS.Age.mean(),inplace=True)

titanic[titanic.Survived==1][titanic.Embarked=='Q'].Age.fillna(SurvivedQ.Age.mean(),inplace=True)

titanic[titanic.Survived==1][titanic.Embarked=='C'].Age.fillna(SurvivedC.Age.mean(),inplace=True)

titanic[titanic.Survived==0][titanic.Embarked=='S'].Age.fillna(DidntSurvivedS.Age.mean(),inplace=True)

titanic[titanic.Survived==0][titanic.Embarked=='Q'].Age.fillna(DidntSurvivedQ.Age.mean(),inplace=True)

titanic[titanic.Survived==0][titanic.Embarked=='C'].Age.fillna(DidntSurvivedC.Age.mean(),inplace=True)

这没有显示任何错误,但仍然无法正常工作.知道我该怎么办吗?

解决方法:

titanic['age'] = titanic.groupby(['survived','embarked'])['age']

.apply(lambda x: x.fillna(x.mean()))

import seaborn as sns

titanic = sns.load_dataset('titanic')

#check NaN rows in age

print (titanic[titanic['age'].isnull()].head(10))

survived pclass sex age sibsp parch fare embarked class \

5 0 3 male NaN 0 0 8.4583 Q Third

17 1 2 male NaN 0 0 13.0000 S Second

19 1 3 female NaN 0 0 7.2250 C Third

26 0 3 male NaN 0 0 7.2250 C Third

28 1 3 female NaN 0 0 7.8792 Q Third

29 0 3 male NaN 0 0 7.8958 S Third

31 1 1 female NaN 1 0 146.5208 C First

32 1 3 female NaN 0 0 7.7500 Q Third

36 1 3 male NaN 0 0 7.2292 C Third

42 0 3 male NaN 0 0 7.8958 C Third

who adult_male deck embark_town alive alone

5 man True NaN Queenstown no True

17 man True NaN Southampton yes True

19 woman False NaN Cherbourg yes True

26 man True NaN Cherbourg no True

28 woman False NaN Queenstown yes True

29 man True NaN Southampton no True

31 woman False B Cherbourg yes False

32 woman False NaN Queenstown yes True

36 man True NaN Cherbourg yes True

42 man True NaN Cherbourg no True

idx = titanic[titanic['age'].isnull()].index

titanic['age'] = titanic.groupby(['survived','embarked'])['age']

.apply(lambda x: x.fillna(x.mean()))

#check if values was replaced

print (titanic.loc[idx].head(10))

survived pclass sex age sibsp parch fare embarked \

5 0 3 male 30.325000 0 0 8.4583 Q

17 1 2 male 28.113184 0 0 13.0000 S

19 1 3 female 28.973671 0 0 7.2250 C

26 0 3 male 33.666667 0 0 7.2250 C

28 1 3 female 22.500000 0 0 7.8792 Q

29 0 3 male 30.203966 0 0 7.8958 S

31 1 1 female 28.973671 1 0 146.5208 C

32 1 3 female 22.500000 0 0 7.7500 Q

36 1 3 male 28.973671 0 0 7.2292 C

42 0 3 male 33.666667 0 0 7.8958 C

class who adult_male deck embark_town alive alone

5 Third man True NaN Queenstown no True

17 Second man True NaN Southampton yes True

19 Third woman False NaN Cherbourg yes True

26 Third man True NaN Cherbourg no True

28 Third woman False NaN Queenstown yes True

29 Third man True NaN Southampton no True

31 First woman False B Cherbourg yes False

32 Third woman False NaN Queenstown yes True

36 Third man True NaN Cherbourg yes True

42 Third man True NaN Cherbourg no True

#check mean values

print (titanic.groupby(['survived','embarked'])['age'].mean())

survived embarked

0 C 33.666667

Q 30.325000

S 30.203966

1 C 28.973671

Q 22.500000

S 28.113184

Name: age, dtype: float64

标签:missing-data,python,pandas,scikit-learn,machine-learning

来源: https://codeday.me/bug/20191012/1902585.html

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值