python空值用前值填充_如何使用与其他两列匹配的python填充数据集中的空值?

I have a titanic Dataset. It has attributes and i was working manly on

1.Age

2.Embark ( from which port passengers embarked..There are total 3 ports..S,Q and C)

3.Survived ( 0 for did not survived,1 for survived)

I was filtering the useless data. Then i needed to fill Null values present in Age. So i counted how many passengers survived and didn't survived in each Embark i.e. S,Q and C

I find out the mean age of Passengers who survived and who did not survived after embarking from each S,Q and C port. But now i have no idea how to fill these 6 values ( 3 for survived from each S,Q and C and 3 for who did not survived from each S,Q and C...So total 6) in the original titanic Age column. If i do simply titanic.Age.fillna('With one of the six values') it will fill All the Null values of Age with that one value which i don't want.

After giving some time,i tried this.

titanic[titanic.Survived==1][titanic.Embarked=='S'].Age.fillna(SurvivedS.Age.mean(),inplace=True)

titanic[titanic.Survived==1][titanic.Embarked=='Q'].Age.fillna(SurvivedQ.Age.mean(),inplace=True)

titanic[titanic.Survived==1][titanic.Embarked=='C'].Age.fillna(SurvivedC.Age.mean(),inplace=True)

titanic[titanic.Survived==0][titanic.Embarked=='S'].Age.fillna(DidntSurvivedS.Age.mean(),inplace=True)

titanic[titanic.Survived==0][titanic.Embarked=='Q'].Age.fillna(DidntSurvivedQ.Age.mean(),inplace=True)

titanic[titanic.Survived==0][titanic.Embarked=='C'].Age.fillna(DidntSurvivedC.Age.mean(),inplace=True)

This showed no error but still it doesn't work. Any idea what should i do?

解决方案

I think you need groupby with apply with fillna by mean:

titanic['age'] = titanic.groupby(['survived','embarked'])['age']

.apply(lambda x: x.fillna(x.mean()))

import seaborn as sns

titanic = sns.load_dataset('titanic')

#check NaN rows in age

print (titanic[titanic['age'].isnull()].head(10))

survived pclass sex age sibsp parch fare embarked class \

5 0 3 male NaN 0 0 8.4583 Q Third

17 1 2 male NaN 0 0 13.0000 S Second

19 1 3 female NaN 0 0 7.2250 C Third

26 0 3 male NaN 0 0 7.2250 C Third

28 1 3 female NaN 0 0 7.8792 Q Third

29 0 3 male NaN 0 0 7.8958 S Third

31 1 1 female NaN 1 0 146.5208 C First

32 1 3 female NaN 0 0 7.7500 Q Third

36 1 3 male NaN 0 0 7.2292 C Third

42 0 3 male NaN 0 0 7.8958 C Third

who adult_male deck embark_town alive alone

5 man True NaN Queenstown no True

17 man True NaN Southampton yes True

19 woman False NaN Cherbourg yes True

26 man True NaN Cherbourg no True

28 woman False NaN Queenstown yes True

29 man True NaN Southampton no True

31 woman False B Cherbourg yes False

32 woman False NaN Queenstown yes True

36 man True NaN Cherbourg yes True

42 man True NaN Cherbourg no True

idx = titanic[titanic['age'].isnull()].index

titanic['age'] = titanic.groupby(['survived','embarked'])['age']

.apply(lambda x: x.fillna(x.mean()))

#check if values was replaced

print (titanic.loc[idx].head(10))

survived pclass sex age sibsp parch fare embarked \

5 0 3 male 30.325000 0 0 8.4583 Q

17 1 2 male 28.113184 0 0 13.0000 S

19 1 3 female 28.973671 0 0 7.2250 C

26 0 3 male 33.666667 0 0 7.2250 C

28 1 3 female 22.500000 0 0 7.8792 Q

29 0 3 male 30.203966 0 0 7.8958 S

31 1 1 female 28.973671 1 0 146.5208 C

32 1 3 female 22.500000 0 0 7.7500 Q

36 1 3 male 28.973671 0 0 7.2292 C

42 0 3 male 33.666667 0 0 7.8958 C

class who adult_male deck embark_town alive alone

5 Third man True NaN Queenstown no True

17 Second man True NaN Southampton yes True

19 Third woman False NaN Cherbourg yes True

26 Third man True NaN Cherbourg no True

28 Third woman False NaN Queenstown yes True

29 Third man True NaN Southampton no True

31 First woman False B Cherbourg yes False

32 Third woman False NaN Queenstown yes True

36 Third man True NaN Cherbourg yes True

42 Third man True NaN Cherbourg no True

#check mean values

print (titanic.groupby(['survived','embarked'])['age'].mean())

survived embarked

0 C 33.666667

Q 30.325000

S 30.203966

1 C 28.973671

Q 22.500000

S 28.113184

Name: age, dtype: float64

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值