【机器学习】异常点检测_sklearn

最新推荐文章于 2024-07-08 03:23:31 发布

煎饼证

最新推荐文章于 2024-07-08 03:23:31 发布

阅读量6.2k

点赞数

分类专栏：机器学习文章标签：异常值 sklearn 机器学习

本文链接：https://blog.csdn.net/jianbinzheng/article/details/80066178

版权

这篇博客探讨了机器学习中的异常值检测，包括Novelty和Outlier的区别。重点介绍了sklearn库中的一类SVM（OneClassSVM）、椭圆包络（EllipticEnvelope）、隔离森林（Isolation Forest）和局部异常因子（Local Outlier Factor）等方法，并对这些方法进行了对比。同时提到了其他如正态分布、马氏距离和DBSCAN等异常检测方法。

摘要由CSDN通过智能技术生成

异常值检测方法可以用于寻找/判断outlier和样本极度不平衡二分类
sklearn提供了几种异常值检测方法
说明：2.7. Novelty and Outlier Detection
例子：Outlier detection with several methods

注意Novelty和Outlier的区别
OneClassSVM
EllipticEnvelope
Isolation Forest
Local Outlier Factor
上面几种方法的对比
其他的一些异常检测方法

注意Novelty和Outlier的区别

novelty detection:
The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.
outlier detection:
The training data contains outliers, and we need to fit the central mode of the training data, ignoring the deviant observations.

即Novelty Detection要求所有训练数据都是正常的，不包含异常点，模型用于探测新加入的点是否异常；OneClassSVM属于此类
而Outlier Detection允许训练数据中有异常点，模型会尽可能适应训练数据而忽视异常点；EllipticEnvelope、IsolationForest、LocalOutlierFactor属于此类

OneClassSVM

一分类SVM，等同于SVDD，sklearn中为svm.OneClassSVM，参考

无监督︱异常、离群点检测一分类——OneClassSVM
SVDD(Support Vector Domain Description) 支持向量数据域描述（2）
sklearn官方文档-OneClassSVM

class sklearn.svm.OneClassSVM(kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1, random_state=None)
-