python dataframe 替换,Python/Pandas Dataframe用中值替换0

I have a python pandas dataframe with several columns and one column has 0 values. I want to replace the 0 values with the median or mean of this column.

data is my dataframe

artist_hotness is the column

mean_artist_hotness = data['artist_hotness'].dropna().mean()

if len(data.artist_hotness[ data.artist_hotness.isnull() ]) > 0:

data.artist_hotness.loc[ (data.artist_hotness.isnull()), 'artist_hotness'] = mean_artist_hotness

I tried this, but it is not working.

解决方案

I think you can use mask and add parameter skipna=True to mean instead dropna. Also need change condition to data.artist_hotness == 0 if need replace 0 values or data.artist_hotness.isnull() if need replace NaN values:

import pandas as pd

import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan]})

print (data)

artist_hotness

0 0.0

1 1.0

2 5.0

3 NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)

print (mean_artist_hotness)

2.0

data['artist_hotness']=data.artist_hotness.mask(data.artist_hotness == 0,mean_artist_hotness)

print (data)

artist_hotness

0 2.0

1 1.0

2 5.0

3 NaN

Alternatively use loc, but omit column name:

data.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness

print (data)

artist_hotness

0 2.0

1 1.0

2 5.0

3 NaN

data.artist_hotness.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness

print (data)

IndexingError: (0 True

1 False

2 False

3 False

Name: artist_hotness, dtype: bool, 'artist_hotness')

Another solution is DataFrame.replace with specifying columns:

data=data.replace({'artist_hotness': {0: mean_artist_hotness}})

print (data)

aa artist_hotness

0 0.0 2.0

1 1.0 1.0

2 5.0 5.0

3 NaN NaN

Or if need replace all 0 values in all columns:

import pandas as pd

import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan], 'aa': [0,1,5,np.nan]})

print (data)

aa artist_hotness

0 0.0 0.0

1 1.0 1.0

2 5.0 5.0

3 NaN NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)

print (mean_artist_hotness)

2.0

data=data.replace(0,mean_artist_hotness)

print (data)

aa artist_hotness

0 2.0 2.0

1 1.0 1.0

2 5.0 5.0

3 NaN NaN

If need replace NaN in all columns use DataFrame.fillna:

data=data.fillna(mean_artist_hotness)

print (data)

aa artist_hotness

0 0.0 0.0

1 1.0 1.0

2 5.0 5.0

3 2.0 2.0

But if only in some columns use Series.fillna:

data['artist_hotness'] = data.artist_hotness.fillna(mean_artist_hotness)

print (data)

aa artist_hotness

0 0.0 0.0

1 1.0 1.0

2 5.0 5.0

3 NaN 2.0

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值