python剔除列表异常值_python – 从数据集中删除异常值

一旦我使用One-class SVM或Elliptic Envelope在我的数据集中识别出异常值,我如何使用这些模型从数据集中删除异常值?

Here是我正在看的例子.

解决方法:

This example略微不透明,因为它不会循环通过未命名的模型.我必须同意你的观点,因为人们关注的是训练方法,所以预测方法经常在SKL手册中被掩盖.

但是……预测方法将返回1或1的向量,对应于非异常值和异常值.

这是我上面引用的原始示例代码:

print(__doc__)

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.font_manager

from scipy import stats

from sklearn import svm

from sklearn.covariance import EllipticEnvelope

# Example settings

n_samples = 200

outliers_fraction = 0.25

clusters_separation = [0, 1, 2]

# define two outlier detection tools to be compared

classifiers = {

"One-Class SVM": svm.OneClassSVM(nu=0.95 * outliers_fraction + 0.05,

kernel="rbf", gamma=0.1),

"robust covariance estimator": EllipticEnvelope(contamination=.1)}

# Compare given classifiers under given settings

xx, yy = np.meshgrid(np.linspace(-7, 7, 500), np.linspace(-7, 7, 500))

n_inliers = int((1. - outliers_fraction) * n_samples)

n_outliers = int(outliers_fraction * n_samples)

ground_truth = np.ones(n_samples, dtype=int)

ground_truth[-n_outliers:] = 0

# Fit the problem with varying cluster separation

for i, offset in enumerate(clusters_separation):

np.random.seed(42)

# Data generation

X1 = 0.3 * np.random.randn(0.5 * n_inliers, 2) - offset

X2 = 0.3 * np.random.randn(0.5 * n_inliers, 2) + offset

X = np.r_[X1, X2]

# Add outliers

X = np.r_[X, np.random.uniform(low=-6, high=6, size=(n_outliers, 2))]

# Fit the model with the One-Class SVM

plt.figure(figsize=(10, 5))

for i, (clf_name, clf) in enumerate(classifiers.items()):

# fit the data and tag outliers

clf.fit(X)

y_pred = clf.decision_function(X).ravel()

threshold = stats.scoreatpercentile(y_pred,

100 * outliers_fraction)

y_pred = y_pred > threshold

n_errors = (y_pred != ground_truth).sum()

# plot the levels lines and the points

Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)

subplot = plt.subplot(1, 2, i + 1)

subplot.set_title("Outlier detection")

subplot.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),

cmap=plt.cm.Blues_r)

a = subplot.contour(xx, yy, Z, levels=[threshold],

linewidths=2, colors='red')

subplot.contourf(xx, yy, Z, levels=[threshold, Z.max()],

colors='orange')

b = subplot.scatter(X[:-n_outliers, 0], X[:-n_outliers, 1], c='white')

c = subplot.scatter(X[-n_outliers:, 0], X[-n_outliers:, 1], c='black')

subplot.axis('tight')

subplot.legend(

[a.collections[0], b, c],

['learned decision function', 'true inliers', 'true outliers'],

prop=matplotlib.font_manager.FontProperties(size=11))

subplot.set_xlabel("%d. %s (errors: %d)" % (i + 1, clf_name,n_errors))

subplot.set_xlim((-7, 7))

subplot.set_ylim((-7, 7))

plt.subplots_adjust(0.04, 0.1, 0.96, 0.94, 0.1, 0.26)

plt.show()

此命令将返回1和-1的向量:

In [7]: clf.predict(X)

Out[7]:

array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1,

1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, 1, 1, 1, 1, 1, -1,

1, 1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

-1, 1, -1, -1, -1, -1, -1, 1, -1, 1, 1, 1, 1])

然后你可以只选择非常数:

In [14]: X[clf.predict(X)==1]

Out[14]: array([[-1.85098575, -2.04147929],

[-1.80569344, -1.54309104],

[-2.07024601, -2.07024109],

[-1.52623616, -1.76976958],

[-2.14084232, -1.83723199],

[-2.13902531, -2.13971893],

[-1.92741132, -2.57398407],

[-2.51747535, -2.16868626],

[-2.30384934, -1.9057258 ],

并将已清理的数据分配给新值:

X_no_outliers=X[clf.predict(X)==1]

希望这可以帮助!

标签:python,scikit-learn

来源: https://codeday.me/bug/20190611/1222059.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值