python去除离群_如何在Python中删除离群值?

I want to remove outliers from my dataset "train" for which purpose I've decided to use z-score or IQR.

I'm running Jupyter notebook on Microsoft Python Client for SQL Server.

I've tried for z-score:

from scipy import stats

train[(np.abs(stats.zscore(train)) < 3).all(axis=1)]

for IQR:

Q1 = train.quantile(0.02)

Q3 = train.quantile(0.98)

IQR = Q3 - Q1

train = train[~((train < (Q1 - 1.5 * IQR)) |(train > (Q3 + 1.5 *

IQR))).any(axis=1)]

...which returns...

for z-score:

TypeError: unsupported operand type(s) for /: 'str' and 'int'

for IQR:

TypeError: unorderable types: str() < float()

My train dataset looks like:

# Number of each type of column

print('Training data shape: ', train.shape)

train.dtypes.value_counts()

Training data shape: (300000, 111) int32 66 float64 30 object 15 dtype: int64

Help would be appreciated.

解决方案

You're having trouble with your code because you're trying to calculate zscore on categorical columns.

To avoid this, you should first separate your train into parts with numerical and categorical features:

num_train = train.select_dtypes(include=["number"])

cat_train = train.select_dtypes(exclude=["number"])

and only after that calculate index of rows to keep:

idx = np.all(stats.zscore(num_train) < 3, axis=1)

and finally add the two pieces together:

train_cleaned = pd.concat([num_train.loc[idx], cat_train.loc[idx]], axis=1)

For IQR part:

Q1 = num_train.quantile(0.02)

Q3 = num_train.quantile(0.98)

IQR = Q3 - Q1

idx = ~((num_train < (Q1 - 1.5 * IQR)) | (num_train > (Q3 + 1.5 * IQR))).any(axis=1)

train_cleaned = pd.concat([num_train.loc[idx], cat_train.loc[idx]], axis=1)

Please let us know if you have any further questions.

PS

As well, you might consider one more approach for dealing with outliers with pandas.DataFrame.clip, which will clip outliers on a case-by-case basis instead of dropping a row altogether.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值