异常值检测方法可以用于寻找/判断outlier和样本极度不平衡二分类
sklearn提供了几种异常值检测方法
说明:2.7. Novelty and Outlier Detection
例子:Outlier detection with several methods
注意Novelty和Outlier的区别
novelty detection:
The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.
outlier detection:
The training data contains outliers, and we need to fit the central mode of the training data, ignoring the deviant observations.
即Novelty Detection要求所有训练数据都是正常的,不包含异常点,模型用于探测新加入的点是否异常;OneClassSVM
属于此类
而Outlier Detection允许训练数据中有异常点,模型会尽可能适应训练数据而忽视异常点;EllipticEnvelope
、IsolationForest
、LocalOutlierFactor
属于此类
OneClassSVM
一分类SVM,等同于SVDD,sklearn中为svm.OneClassSVM
,参考
无监督︱异常、离群点检测 一分类——OneClassSVM
SVDD(Support Vector Domain Description) 支持向量数据域描述(2)
sklearn官方文档-OneClassSVM
class sklearn.svm.OneClassSVM(kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1, random_state=None)
-
基本思想:确定一个超球体,使得球尽可能小,而又包含了尽可能多的点,球内视为正,球外视为异常。则目标函数