使用python绘制出P-R曲线
一、安装第三方库
pip install sklearning
pip install matplotlib
pip install numpy
1、其中sklearning库是机器学习中常用的库,Scikit-learn(以前称为scikits.learn,也称为sklearn)是针对Python 编程语言的免费软件机器学习库 [1] 。它具有各种分类,回归和聚类算法,包括支持向量机,随机森林,梯度提升,k均值和DBSCAN,并且旨在与Python数值科学图书馆NumPy和SciPy。
2、matplotlib为绘制图形的库,Matplotlib 是一个 Python 的 2D绘图库,它以各种硬拷贝格式和跨平台的交互式环境生成出版质量级别的图形。
3、numpy为一个进行矩阵计算的库,NumPy(Numerical Python)是Python的一种开源的数值计算扩展。这种工具可用来存储和处理大型矩阵,比Python自身的嵌套列表(nested list structure)结构要高效的多(该结构也可以用来表示矩阵(matrix)),支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库 [1] 。
二、在pycharm中导入库
import matplotlib.pyplot as plt
import numpy as np
from sklearn import svm, datasets
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import train_test_split
# 1.获取鸢尾花数据集
iris = datasets.load_iris()
# x为数据集
x = iris.data
# y为相应的标记
y = iris.target
# 将0,1,2的数据转换为二进制数据000,001,010
y = label_binarize(y, classes=[0, 1, 2])
# 得到y的第二维的n数 n_classes : 3
n_classes = y.shape[1]
print(y)
random_state = np.random.RandomState(0)
n_sample, n_feature = x.shape # n_sample:150, n_feature:4
x = np.c_[x, random_state.randn(n_sample, 200 * n_feature)] # np.c_ column 增加200*4 八百个列(维度)的噪声
# 将数据划分为训练集和测试集,测试数据比例为 0.5
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5, random_state=random_state)
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True, random_state=random_state))
y_score = classifier.fit(x_train, y_train).decision_function(x_test)
precision = dict()
recall = dict()
average_precision = dict()
for i in range(n_classes):
# [:, i] 表示第一维全取,第二维取i
precision[i], recall[i], _ = precision_recall_curve(y_test[:, i], y_score[:, i])
average_precision[i] = average_precision_score(y_test[:, i], y_score[:, i])
precision['micro'], recall['micro'], _ = precision_recall_curve(y_test.ravel(), y_score.ravel())
# ['micro'] 算精确率的微平均值 ; ravel()将多维数组降为一维 平面化处理
average_precision['micro'] = average_precision_score(y_test, y_score, average="micro")
plt.clf() # 清屏操作
plt.plot(recall["micro"], precision["micro"],
label='Precision-recall curve of class (area = {0:0.2f})'.format(average_precision['micro']))
for i in range(n_classes):
plt.plot(recall[i], precision[i],
label='Precision-recall curve of class {0} (area = {1:0.2f})'.format(i, average_precision[i]))
# lim后面的表示为区间:比如说 plt.xlim(0.0, 1.0), 表示x的区间为(0,1)
plt.xlim(0.0, 1.0)
plt.ylim(0.0, 1.05)
plt.xlabel('Recall', fontsize=16) # 横坐标为Recall
plt.ylabel('Precision', fontsize=16) # 纵坐标为Precision
# 标题为 Extension of Precision-Recall curve to multi-class
plt.title('Extension of Precision-Recall curve to multi-class', fontsize=16)
plt.legend(loc='lower right')
plt.show()