离群点过滤

最新推荐文章于 2024-05-03 22:38:29 发布

Anderson29

最新推荐文章于 2024-05-03 22:38:29 发布

阅读量2k

点赞数

分类专栏：机器学习文章标签：机器学习 python

本文链接：https://blog.csdn.net/sinat_30915819/article/details/76006841

版权

机器学习专栏收录该内容

4 篇文章 1 订阅

订阅专栏

outlier 过滤 python实现

Tuckey算法实现

代码

# Outlier detection 
import numpy as np
from collections import Counter
def detect_outliers(df,n,features):
    """
    Tuckey算法
    """
    outlier_indices = []

    # iterate over features(columns)
    for col in features:
        # 1st quartile (25%)
        Q1 = np.percentile(df[col], 25)
        # 3rd quartile (75%)
        Q3 = np.percentile(df[col],75)
        # Interquartile range (IQR)
        IQR = Q3 - Q1

        # outlier step
        outlier_step = 1.5 * IQR

        # Determine a list of indices of outliers for feature col
        outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step )].index

        # append the found outlier indices for col to the list of outlier indices 
        outlier_indices.extend(outlier_list_col)

    # select observations containing more than 2 outliers
    outlier_indices = Counter(outlier_indices)        
    multiple_outliers = list( k for k, v in outlier_indices.items() if v > n )

    return multiple_outliers

Anderson29

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
离群点过滤

outlier 过滤 python实现Tuckey算法实现代码# Outlier detection import numpy as npfrom collections import Counterdef detect_outliers(df,n,features): """ Tuckey算法 """ outlier_indices = [] # ite
复制链接

扫一扫