python异常值删除,Python从数据中删除异常值

最新推荐文章于 2024-05-04 11:24:03 发布

桃花欲语春不归

最新推荐文章于 2024-05-04 11:24:03 发布

阅读量1.4k

点赞数

文章标签： python异常值删除

I have a data frame as following:

ID Value

A 70

A 80

B 75

C 10

B 50

A 1000

C 60

B 2000

.. ..

I would like to group this data by ID, remove the outliers from the grouped data (the ones we see from the boxplot) and then calculate mean.

So far

grouped = df.groupby('ID')

statBefore = pd.DataFrame({'mean': grouped['Value'].mean(), 'median': grouped['Value'].median(), 'std' : grouped['Value'].std()})

How can I find outliers, remove them and get the statistics.

解决方案

I believe the method you're referring to is to remove values > 1.5 * the interquartile range away from the median. So first, calculate your initial statistics:

statBefore = pd.DataFrame({'q1': grouped['Value'].quantile(.25), \

'median': grouped['Value'].median(), 'q3' : grouped['Value'].quantile(.75)})

And then determine whether values in the original DF are outliers:

def is_outlier(row):

iq_range = statBefore.loc[row.ID]['q3'] - statBefore.loc[row.ID]['q1']

median = statBefore.loc[row.ID]['median']

if row.Value > (median + (1.5* iq_range)) or row.Value < (median - (1.5* iq_range)):

return True

else:

return False

#apply the function to the original df:

df.loc[:, 'outlier'] = df.apply(is_outlier, axis = 1)

#filter to only non-outliers:

df_no_outliers = df[~(df.outlier)]

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

桃花欲语春不归

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python异常值删除,Python从数据中删除异常值

I have a data frame as following:ID ValueA 70A 80B 75C 10B 50A 1000C 60B 2000.. ..I would like to group this data by ID, remove the outliers from the grouped data (the ones we see fro...
复制链接

扫一扫