Task1:异常检测介绍(2天)
笔记
练习
1、安装Scikit-learn及pyod
安装Scikit-learn参考步骤:https://blog.csdn.net/u013709270/article/details/70441043
我采用pip安装pyod,安装步骤参考:https://github.com/yzhao062/pyod#installation
pyod中文介绍https://zhuanlan.zhihu.com/p/58313521
pyod的api英文文档https://pyod.readthedocs.io/en/latest/
github上pyod工具包https://github.com/yzhao062/pyod(下边要写的toy_example就在这里边)
这篇“使用PyOD库在Python中学习异常检测”看起来很厉害还没看完:https://blog.csdn.net/weixin_41697507/article/details/89408236?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_baidulandingword-2&spm=1001.2101.3001.4242
下载python扩展包的地方:http://www.lfd.uci.edu/~gohlke/pythonlibs/
搭建个人博客步骤参考:https://zhuanlan.zhihu.com/p/28321740
讲解机器学习算法的文章:https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/?utm_source=outlierdetectionpyod#
2、练习
# -*- coding: utf-8 -*-
"""Example of using kNN for outlier detection
"""
from __future__ import division
from __future__ import print_function
import os
import sys
# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))
from pyod.models.knn import KNN
from pyod.utils.data import generate_data
from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize
if __name__ == "__main__":
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points
# Generate sample data
X_train, y_train, X_test, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
n_features=2,
contamination=contamination,
random_state=42)
# train kNN detector
clf_name = 'KNN'
clf = KNN()
clf.fit(X_train)
# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores
# get the prediction on the test data
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(X_test) # outlier scores
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)
# visualize the results
visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
y_test_pred, show_figure=True, save_figure=True)