异常检测_基于相似度的方法

最新推荐文章于 2022-10-28 11:48:50 发布

莫知我哀

最新推荐文章于 2022-10-28 11:48:50 发布

阅读量610

点赞数 1

分类专栏： AnomalyDetection 文章标签： python 数据分析机器学习

本文链接：https://blog.csdn.net/weixin_43822124/article/details/112850245

版权

AnomalyDetection 专栏收录该内容

6 篇文章 3 订阅

订阅专栏

github 地址：链接

异常检测-基于相似度的方法

1常用方法
2Pyod中基于相似度的异常检测函数
3LOF方法示例

1常用方法

基于距离的度量
- 基于单元
- 基于索引
基于密度的度量

2Pyod中基于相似度的异常检测函数

LOF
- 基于密度检测方法。可量化每个数据点的异常程度。适用中等高维数据。
COF
- 类似于LOF，但密度估计不一样。LOF是基于欧氏距离的，即默认数据是以球形分布的，假设是特征是线性相关的，LOF就无能为力。COF中，近邻的局部密度是基于最短路径方法求得的，亦称链式距离（链接当前实例和所有k个近邻的最短距离之和）
CBLOF
- 基于密度；
- 将数据集和由聚类算法生成的聚类模型作为输入；然后基于该点所属的聚类的大小以及到最近的大聚类的距离来计算异常分数。
- 异常值得分仅根据它们与最近的大型群集中心的距离来计算
LOCI
- 基于密度。不适合处理较大数据集
HBOS
KNN
SOD
- 将数据集映射到低维子空间，根据子空间中映射数据的稀疏程度来确定异常数据是否存在

3LOF方法示例

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import matplotlib.font_manager
from sklearn.neighbors import LocalOutlierFactor
import seaborn as sns 
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus']=False
from pyod.utils.data import generate_data,get_outliers_inliers

# 生成二维随机数据
X_train, Y_train = generate_data(n_train=200,train_only=True, n_features=2)

# 拆分出异常数据和正常数据
x_outliers, x_inliers = get_outliers_inliers(X_train,Y_train)

# 绘制生成的数据图
F1 = X_train[:,[0]].reshape(-1,1)
F2 = X_train[:,[1]].reshape(-1,1)
sns.scatterplot(F1.reshape(-1),F2.reshape(-1),hue=Y_train)
plt.xlabel('F1')
plt.ylabel('F2') 
plt.show()

在这里插入图片描述

pyod.models.lof.LOF(n_neighbors=20, 
                    algorithm='auto', 
                    leaf_size=30, 
                    metric='minkowski',
                    p=2, metric_params=None, 
                    contamination=0.1, n_jobs=1)

from pyod.models.lof import LOF

clf = LOF()
clf.fit(X_train)

LOF(algorithm='auto', contamination=0.1, leaf_size=30, metric='minkowski',
  metric_params=None, n_jobs=1, n_neighbors=20, p=2)

y_pred = clf.predict(X_train)# 预测训练样本的标签
from sklearn.metrics import classification_report
print(classification_report(y_true=Y_train,y_pred=y_pred))

              precision    recall  f1-score   support

         0.0       0.97      0.99      0.98       180
         1.0       0.88      0.70      0.78        20

    accuracy                           0.96       200
   macro avg       0.92      0.84      0.88       200
weighted avg       0.96      0.96      0.96       200

n_inliers = len(x_inliers)
n_outliers = len(x_outliers)
#生成热力图的坐标点
xx , yy = np.meshgrid(np.linspace(-10, 10, 200), np.linspace(-10, 10, 200))
# 根据百分比生成
threshold = -clf.threshold_

# 得到每个坐标点的异常值得分
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1
Z = Z.reshape(xx.shape)
plt.figure(figsize=(10, 10))
#将正常样本区域绘制成蓝色
plt.contourf(xx, yy, Z, levels = np.linspace(Z.min(), threshold, 10),cmap=plt.cm.Blues_r)
#绘制决策曲线
a = plt.contour(xx, yy, Z, levels=[threshold],linewidths=2, colors='red')
#将异常样本区域绘制成橘黄色
plt.contourf(xx, yy, Z, levels=[threshold, Z.max()],colors='orange')
#绘制正常点（白色）
b = plt.scatter(X_train[:-n_outliers, 0], X_train[:-n_outliers, 1], c='white',s=20, edgecolor='k') 
# 绘制异常点（黑色）
c = plt.scatter(X_train[-n_outliers:, 0], X_train[-n_outliers:, 1], c='black',s=20, edgecolor='k')
plt.axis('tight')

plt.legend(
    [a.collections[0], b, c],
    ['learned decision function', 'true inliers', 'true outliers'],
    prop=matplotlib.font_manager.FontProperties(size=10),
    loc='lower right')

plt.title('LOF')
plt.xlim((-10, 10))
plt.ylim((-10, 10))
plt.show()

在这里插入图片描述

sns.scatterplot(F1.reshape(-1),F2.reshape(-1),hue=y_pred)
plt.xlabel('F1')
plt.ylabel('F2') 
plt.show()

在这里插入图片描述

莫知我哀

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
3
评论
异常检测_基于相似度的方法

github 地址：链接异常检测-基于相似度的方法1常用方法2Pyod中基于相似度的异常检测函数3LOF方法示例1常用方法基于距离的度量基于单元基于索引基于密度的度量2Pyod中基于相似度的异常检测函数LOF基于密度检测方法。可量化每个数据点的异常程度。适用中等高维数据。COF类似于LOF，但密度估计不一样。LOF是基于欧氏距离的，即默认数据是以球形分布的，假设是特征是线性相关的，LOF就无能为力。COF中，近邻的局部密度是基于最短路径方法求得的，亦称链式距离（.
复制链接

扫一扫

专栏目录