利用python如何进行数据匹配_如何使用与其他两列匹配的python在数据集中...

我认为您需要groupby与mean by mean一起申请:

titanic['age'] = titanic.groupby(['survived','embarked'])['age']

.apply(lambda x: x.fillna(x.mean()))

import seaborn as sns

titanic = sns.load_dataset('titanic')

#check NaN rows in age

print (titanic[titanic['age'].isnull()].head(10))

survived pclass sex age sibsp parch fare embarked class \n5 0 3 male NaN 0 0 8.4583 Q Third

17 1 2 male NaN 0 0 13.0000 S Second

19 1 3 female NaN 0 0 7.2250 C Third

26 0 3 male NaN 0 0 7.2250 C Third

28 1 3 female NaN 0 0 7.8792 Q Third

29 0 3 male NaN 0 0 7.8958 S Third

31 1 1 female NaN 1 0 146.5208 C First

32 1 3 female NaN 0 0 7.7500 Q Third

36 1 3 male NaN 0 0 7.2292 C Third

42 0 3 male NaN 0 0 7.8958 C Third

who adult_male deck embark_town alive alone

5 man True NaN Queenstown no True

17 man True NaN Southampton yes True

19 woman False NaN Cherbourg yes True

26 man True NaN Cherbourg no True

28 woman False NaN Queenstown yes True

29 man True NaN Southampton no True

31 woman False B Cherbourg yes False

32 woman False NaN Queenstown yes True

36 man True NaN Cherbourg yes True

42 man True NaN Cherbourg no True

idx = titanic[titanic['age'].isnull()].index

titanic['age'] = titanic.groupby(['survived','embarked'])['age']

.apply(lambda x: x.fillna(x.mean()))

#check if values was replaced

print (titanic.loc[idx].head(10))

survived pclass sex age sibsp parch fare embarked \n5 0 3 male 30.325000 0 0 8.4583 Q

17 1 2 male 28.113184 0 0 13.0000 S

19 1 3 female 28.973671 0 0 7.2250 C

26 0 3 male 33.666667 0 0 7.2250 C

28 1 3 female 22.500000 0 0 7.8792 Q

29 0 3 male 30.203966 0 0 7.8958 S

31 1 1 female 28.973671 1 0 146.5208 C

32 1 3 female 22.500000 0 0 7.7500 Q

36 1 3 male 28.973671 0 0 7.2292 C

42 0 3 male 33.666667 0 0 7.8958 C

class who adult_male deck embark_town alive alone

5 Third man True NaN Queenstown no True

17 Second man True NaN Southampton yes True

19 Third woman False NaN Cherbourg yes True

26 Third man True NaN Cherbourg no True

28 Third woman False NaN Queenstown yes True

29 Third man True NaN Southampton no True

31 First woman False B Cherbourg yes False

32 Third woman False NaN Queenstown yes True

36 Third man True NaN Cherbourg yes True

42 Third man True NaN Cherbourg no True

#check mean values

print (titanic.groupby(['survived','embarked'])['age'].mean())

survived embarked

0 C 33.666667

Q 30.325000

S 30.203966

1 C 28.973671

Q 22.500000

S 28.113184

Name: age, dtype: float64

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值