推荐收藏 | 机器学习画图的神器scikit-plot

最新推荐文章于 2024-05-12 09:38:27 发布

进阶媛小吴

最新推荐文章于 2024-05-12 09:38:27 发布

阅读量3.8k

点赞数 6

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/wuli_xin/article/details/106612952

版权

机器学习专栏收录该内容

11 篇文章 2 订阅

订阅专栏

一、安装说明

1、安装Scikit-plot非常简单，直接用命令：

pip install scikit-plot

即可完成安装。

2、仓库地址：

https://github.com/reiinakano/scikit-plot

二、使用说明

从 Scikit-Plot 官网中，搜集出这四大模块里所有的细分函数：

scikitplot.metrics

plot_confusion_matrix：分类的混淆矩阵
plot_precision_recall：分类的查准查全
plot_roc：分类的 ROC 曲线
plot_ks_statistic
plot_silhouette：度量聚类好坏的轮廓系数
plot_calibration_curve
plot_cumulative_gain
plot_lift_curve
scikitplot.estimators
plot_learning_curve：学习曲线
plot_feature_importances：特征重要性

scikitplot.cluster

plot_elbow_curve：决定簇个数的肘部曲线

scikitplot.decomposition

plot_pca_component_variance：可解释方差
plot_pca_2d_projection：高维投影到二维

1、画出分类评级指标的ROC曲线

完整代码：

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
nb = GaussianNB()
nb.fit(X_train, y_train)
predicted_probas = nb.predict_proba(X_test)
# The magic happens here
import matplotlib.pyplot as plt
import scikitplot as skplt
skplt.metrics.plot_roc(y_test, predicted_probas)
plt.show()

效果图：

在这里插入图片描述

2、P-R曲线

精确率precision vs 召回率recall 曲线，以recall作为横坐标轴，precision作为纵坐标轴

完整代码：

import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_digits as load_data
import scikitplot as skplt
# Load dataset
X, y = load_data(return_X_y=True)
# Create classifier instance then fit
nb = GaussianNB()
nb.fit(X,y)
# Get predicted probabilities
y_probas = nb.predict_proba(X)
#skplt.metrics.plot_precision_recall_curve(y, y_probas, cmap='nipy_spectral')
skplt.metrics.plot_precision_recall(y, y_probas, cmap='nipy_spectral')
plt.show()

注意版本：

FutureWarning:Function plot_precision_recall_curve is deprecated; This will be removed in v0.5.0.
Please use scikitplot.metrics.plot_precision_recall instead.

#skplt.metrics.plot_precision_recall_curve(y, y_probas, cmap='nipy_spectral')--- v0.5.0.已修改为下一句代码
skplt.metrics.plot_precision_recall(y, y_probas, cmap='nipy_spectral')

效果图：

在这里插入图片描述

3、混淆矩阵

分类的重要评价标准，下面代码是用随机森林对鸢尾花数据集进行分类，分类结果画一个归一化的混淆矩阵。

完整代码：

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits as load_data
from sklearn.model_selection import cross_val_predict
import matplotlib.pyplot as plt
import scikitplot as skplt
X, y = load_data(return_X_y=True)
# Create an instance of the RandomForestClassifier
classifier = RandomForestClassifier()
# Perform predictions
predictions = cross_val_predict(classifier, X, y)
plot = skplt.metrics.plot_confusion_matrix(y, predictions, normalize=True)
plt.show()

效果图：

在这里插入图片描述

4、校准曲线

完整代码：

from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import scikitplot as skplt
 #Set max_iter to a larger value. The default is 1000.
X, y = make_classification(n_samples=100000, n_features=20,
                           n_informative=2, n_redundant=2,
                           random_state=20)

X_train, y_train, X_test, y_test = X[:1000], y[:1000], X[1000:], y[1000:]

rf_probas = RandomForestClassifier().fit(X_train, y_train).predict_proba(X_test)
#lr_probas = LogisticRegression().fit(X_train, y_train).predict_proba(X_test)
lr_probas = LogisticRegression(max_iter=7600).fit(X_train, y_train).predict_proba(X_test)
nb_probas = GaussianNB().fit(X_train, y_train).predict_proba(X_test)
sv_scores = LinearSVC().fit(X_train, y_train).decision_function(X_test)

probas_list = [rf_probas, lr_probas, nb_probas, sv_scores]
clf_names=['Random Forest',
           'Logistic Regression',
           'Gaussian Naive Bayes',
           'Support Vector Machine']

skplt.metrics.plot_calibration_curve(y_test,
                                     probas_list=probas_list,
                                     clf_names=clf_names,
                                     n_bins=10)
plt.show()

效果图：

在这里插入图片描述

遇到的问题：

C:\Users\wu\AppData\Roaming\Python\Python37\site-packages\sklearn\svm\_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

在这里插入图片描述

解决方法：

LogisticRegression() 应该增加 max_iter

#lr_probas = LogisticRegression().fit(X_train, y_train).predict_proba(X_test)
lr_probas = LogisticRegression(max_iter=7600).fit(X_train, y_train).predict_proba(X_test)

其他方法可参考：

https://stackoverflow.com/questions/52670012/convergencewarning-liblinear-failed-to-converge-increase-the-number-of-iterati

在这里插入图片描述

5、plot_calibration_curve

from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
#lr = LogisticRegression()
lr = LogisticRegression(max_iter=7600)
lr.fit(X, y)
probas = lr.predict_proba(X)
skplt.metrics.plot_cumulative_gain(y_true=y, y_probas=probas)
plt.show()

效果图：

在这里插入图片描述

遇到的问题：

在这里插入图片描述

解决问题：

理由同上

#lr = LogisticRegression()
lr = LogisticRegression(max_iter=7600)

在这里插入图片描述

6、plot_silhouette

决定簇个数的肘部曲线

完整代码：

from __future__ import absolute_import
import matplotlib.pyplot as plt
import scikitplot as skplt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris as load_data

X, y = load_data(return_X_y=True)
kmeans = KMeans(random_state=1)
skplt.cluster.plot_elbow_curve(kmeans, X, cluster_ranges=range(1, 11))
plt.show()

效果图：

在这里插入图片描述

7、 plot_feature_importances

Scikit-Plot 中的 plot_feature_importances 函数可以将「特征重要性」排序并画出。

函数 plot_feature_importances用到的参数有 4 个：

RF：随机森林分类器
feature_names：特征名称，本例有 30 个
x_tick_rotation：横轴刻度旋转度，本例设置 90 度，因为特征多，名字长，不旋转 90 度图中显示非常乱
figsize：图片大小

完整代码：

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris as load_data
import matplotlib.pyplot as plt
import scikitplot as skplt

X, y = load_data(return_X_y=True)
rf = RandomForestClassifier()
rf.fit(X, y)
skplt.estimators.plot_feature_importances(rf,feature_names=['petal length',
                                                         'petal width',
                                                         'sepal length',
                                                         'sepal width'])
plt.show()

效果图：

在这里插入图片描述

8、plot_ks_statistic

完整代码：

from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt


X, y = load_data(return_X_y=True)
lr = LogisticRegression(max_iter=7600)
lr.fit(X, y)
probas = lr.predict_proba(X)
skplt.metrics.plot_ks_statistic(y_true=y, y_probas=probas)
plt.show()

效果图：

在这里插入图片描述

9、 plot_learning_curve

完整代码：

from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
rf = RandomForestClassifier()
skplt.estimators.plot_learning_curve(rf, X, y)
plt.show()

效果图：

在这里插入图片描述

10、plot_lift_curve

完整代码

from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
lr = LogisticRegression(max_iter=7600)
lr.fit(X, y)
probas = lr.predict_proba(X)
skplt.metrics.plot_lift_curve(y_true=y, y_probas=probas)
plt.show()

效果图：

在这里插入图片描述

11、plot_pca_2d_projection

完整代码

from sklearn.decomposition import PCA
from sklearn.datasets import load_digits as load_data
import scikitplot as skplt
import matplotlib.pyplot as plt

X, y = load_data(return_X_y=True)
pca = PCA(random_state=1)
pca.fit(X)
skplt.decomposition.plot_pca_2d_projection(pca, X, y)
plt.show()

效果图

在这里插入图片描述

12、plot_pca_component

完整代码：

from sklearn.decomposition import PCA
from sklearn.datasets import load_digits as load_data
import scikitplot as skplt
import matplotlib.pyplot as plt

X, y = load_data(return_X_y=True)
pca = PCA(random_state=1)
pca.fit(X)
skplt.decomposition.plot_pca_component_variance(pca)
plt.show()

效果图

在这里插入图片描述

13、plot_silhouette

完整代码

from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
kmeans = KMeans(n_clusters=4, random_state=1)
cluster_labels = kmeans.fit_predict(X)
skplt.metrics.plot_silhouette(X, cluster_labels)
plt.show()

效果图

在这里插入图片描述

进阶媛小吴

关注

6
点赞
踩
50

收藏

觉得还不错? 一键收藏
打赏
2
评论
推荐收藏 | 机器学习画图的神器scikit-plot

推荐收藏 | 一个画机器学习图的神器scikit-plot一、安装说明1、安装Scikit-plot非常简单，直接用命令：pip install scikit-plot即可完成安装。2、仓库地址：https://github.com/reiinakano/scikit-plot二、使用说明从 Scikit-Plot 官网中，搜集出这四大模块里所有的细分函数：scikitplot.metricsplot_confusion_matrix：分类的混淆矩阵plot_precisio
复制链接

扫一扫