第一个例子:
import numpy as np # 加载Numpy库
from sklearn.covariance import EllipticEnvelope # 可实现MCD的类[1]
from sklearn.svm import OneClassSVM # 可实现OCSVM的类[2]
import matplotlib.pyplot as plt # 提供类似MATLAB的绘图框架[3]
import matplotlib.font_manager # 用于跨平台查找、管理和使用字体的模块[4]
from sklearn.datasets import load_wine # 加载sklearn自带红酒数据集(wine)[5]
classifiers = {
"Empirical Covariance":
EllipticEnvelope(support_fraction=1.0, contamination=0.25), # 马氏距离
"Robust Covariance (Minimum Covariance Determinant)":
EllipticEnvelope(contamination=0.25), # 鲁棒马氏距离
"OCSVM": OneClassSVM(nu=0.25, gamma=0.35), # OCSVM
} # 字典
colors = ["m", "g", "b"] # 列表
legend1 = {} # 先生成一个空的字典
legend2 = {}
X1 = load_wine()["data"][:, [1, 2]] # 样本总数n=178,取两个维度p=2
xx1, yy1 = np.meshgrid(np.linspace(0, 6, 500), np.linspace(1, 4.5, 500))
# np.meshgrid代表的是将xx1中每一个数据和yy1中每一个数据组合生成很多点,然后将这些点的x坐标放入到xx1中,y坐标放入yy1中,并且相应位置是对应的[6]
# np.linspace用来创建等差数列[7]
for i, (clf_name, clf) in enumerate(classifiers.items()):
'''
enumerate就是枚举的意思,把元素一个个列举出来,第一个是索引值,第二个是对应的元素
items()函数以列表返回可遍历的(键, 值) 元组数组
classifiers.items() => dict_items([('Empirical Covariance', EllipticEnvelope(contamination=0.25, support_fraction=1.0)),
('Robust Covariance (Minimum Covariance Determinant)', EllipticEnvelope(contamination=0.25)),
('OCSVM', OneClassSVM(gamma=0.35, nu=0.25))])
'''
plt.figure(1) # 创建自定义图像1,把所有图片放在一张图里[8]
clf.fit(X1) # 拟合椭圆模型
Z1 = clf.decision_function(np.c_[xx1.ravel(), yy1.ravel()])
'''
ravel()将矩阵展平
np.c_[a,b,c...]可以拼接多个数组,要求待拼接的多个数组的行数必须相同
decision_function()计算样本点到分割超平面的函数距离
'''
Z1 = Z1.reshape(xx1.shape)
legend1[clf_name] = plt.contour(xx1, yy1, Z1, levels=[0], linewidths=2, colors=colors[i]) # plt.contour() 绘制轮廓线(等高线)[9]
'''
print(legend1) => {'Empirical Covariance': <matplotlib.contour.QuadContourSet object at 0x000001929240DFD0>,
'Robust Covariance (Minimum Covariance Determinant)': <matplotlib.contour.QuadContourSet object at 0x0000019291DFCA90>,
'OCSVM': <matplotlib.contour.QuadContourSet object at 0x0000019291DFCD00>}
'''
legend1_values_list = list(legend1.values())
'''
print(legend1_values_list) => [<matplotlib.contour.QuadContourSet object at 0x000002B09073EFD0>,
<matplotlib.contour.QuadContourSet object at 0x000002B09074DA90>,
<matplotlib.contour.QuadContourSet object at 0x000002B09074DD00>]
'''
legend1_keys_list = list(legend1.keys())
'''
print(legend1_keys_list) => ['Empirical Covariance',
'Robust Covariance (Minimum Covariance Determinant)',
'OCSVM']
'''
plt.figure(1)[8]
plt.title("Outlier detection on a real data set (wine recognition)")
plt.scatter(X1[:, 0], X1[:, 1], color="black")
bbox_args = dict(boxstyle="round", fc="0.8") # dict()函数用于创造一个字典
# boxstyle:方形外框 fc:背景颜色
arrow_args = dict(arrowstyle="->") # '->' head_length=0.4,head_width=0.2
plt.annotate( # plt.annotate函数用于标注文字[10]
"outlying points", # 为注释文本内容
xy=(4, 2), # 为被注释的坐标点
xycoords="data", # 'data':使用被注释对象的坐标系统(默认)
textcoords="data", # 'data':使用被注释对象的坐标系统(默认) 同xycoords
xytext=(3, 1.25), # 为被注释文字的坐标位置
bbox=bbox_args,
arrowprops=arrow_args,
)
plt.xlim((xx1.min(), xx1.max())) # 获取或设置当前x轴的最大最小值
plt.ylim((yy1.min(), yy1.max()))
plt.legend( # 添加图例[11]
(
legend1_values_list[0].collections[0],
legend1_values_list[1].collections[0],
legend1_values_list[2].collections[0],
),
(legend1_keys_list[0], legend1_keys_list[1], legend1_keys_list[2]),
loc="upper center", # 位置在正上方
prop=matplotlib.font_manager.FontProperties(size=11),# prop:字体属性 FontProperties类用于存储和操作字体的属性[12]
)
'''
print(legend1_values_list[0]) => <matplotlib.contour.QuadContourSet object at 0x000001758F0CEFD0>
print(legend1_values_list[0].collections[0]) => <matplotlib.collections.LineCollection object at 0x000001758F0DC310>
'''
plt.ylabel("ash")
plt.xlabel("malic_acid")
plt.show()
[1] sklearn.covariance.EllipticEnvelope
文档地址:sklearn.covariance.EllipticEnvelope — scikit-learn 1.1.1 documentation
[2] sklearn.svm.OneClassSVM
文档地址:sklearn.svm.OneClassSVM — scikit-learn 1.1.1 documentation
[3] matplotlib.pyplot
文档地址:pyplot — Matplotlib 2.0.2 文档
[4] matplotlib.font_manager
文档地址:font_manager — Matplotlib 2.0.2 文档
[5] from sklearn.datasets import load_wine:
类 | 3 |
每类样品 | [59,71,48] |
样品总数 | 178 |
維度 | 13 |
特征 | 真实、积极 |
文档地址:sklearn.datasets.load_wine — scikit-learn 1.1.1 文档
[6] np.meshgrid
文档地址:numpy.meshgrid — NumPy v1.23 Manual
[7] np.linspace
文档地址:numpy.linspace — NumPy v1.23 Manual
[8] plt.figure
文档地址:pyplot — Matplotlib 2.0.2 文档
[9] plt.contour
文档地址:pyplot — Matplotlib 2.0.2 documentation
[10] plt.annotate
文档地址:pyplot — Matplotlib 2.0.2 documentation
[11] plt.legend
文档地址:pyplot — Matplotlib 2.0.2 documentation
[12] matplotlib.font_manager.FontProperties