推荐收藏 | 机器学习画图的神器scikit-plot

一、 安装说明

1、安装Scikit-plot非常简单,直接用命令:

pip install scikit-plot

即可完成安装。

2、仓库地址:

https://github.com/reiinakano/scikit-plot

二、 使用说明

从 Scikit-Plot 官网中,搜集出这四大模块里所有的细分函数:

scikitplot.metrics

  • plot_confusion_matrix:分类的混淆矩阵
  • plot_precision_recall:分类的查准查全
  • plot_roc:分类的 ROC 曲线
  • plot_ks_statistic
  • plot_silhouette:度量聚类好坏的轮廓系数
  • plot_calibration_curve
  • plot_cumulative_gain
  • plot_lift_curve
  • scikitplot.estimators
  • plot_learning_curve:学习曲线
  • plot_feature_importances:特征重要性

scikitplot.cluster

  • plot_elbow_curve:决定簇个数的肘部曲线

scikitplot.decomposition

  • plot_pca_component_variance:可解释方差
  • plot_pca_2d_projection:高维投影到二维

1、画出分类评级指标的ROC曲线

完整代码:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
nb = GaussianNB()
nb.fit(X_train, y_train)
predicted_probas = nb.predict_proba(X_test)
# The magic happens here
import matplotlib.pyplot as plt
import scikitplot as skplt
skplt.metrics.plot_roc(y_test, predicted_probas)
plt.show()
效果图:

在这里插入图片描述

2、P-R曲线

精确率precision vs 召回率recall 曲线,以recall作为横坐标轴,precision作为纵坐标轴

完整代码:
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_digits as load_data
import scikitplot as skplt
# Load dataset
X, y = load_data(return_X_y=True)
# Create classifier instance then fit
nb = GaussianNB()
nb.fit(X,y)
# Get predicted probabilities
y_probas = nb.predict_proba(X)
#skplt.metrics.plot_precision_recall_curve(y, y_probas, cmap='nipy_spectral')
skplt.metrics.plot_precision_recall(y, y_probas, cmap='nipy_spectral')
plt.show()
注意版本:

FutureWarning:Function plot_precision_recall_curve is deprecated; This will be removed in v0.5.0.
Please use scikitplot.metrics.plot_precision_recall instead.

#skplt.metrics.plot_precision_recall_curve(y, y_probas, cmap='nipy_spectral')--- v0.5.0.已修改为下一句代码
skplt.metrics.plot_precision_recall(y, y_probas, cmap='nipy_spectral')
效果图:

在这里插入图片描述

3、混淆矩阵

分类的重要评价标准,下面代码是用随机森林对鸢尾花数据集进行分类,分类结果画一个归一化的混淆矩阵。

完整代码:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits as load_data
from sklearn.model_selection import cross_val_predict
import matplotlib.pyplot as plt
import scikitplot as skplt
X, y = load_data(return_X_y=True)
# Create an instance of the RandomForestClassifier
classifier = RandomForestClassifier()
# Perform predictions
predictions = cross_val_predict(classifier, X, y)
plot = skplt.metrics.plot_confusion_matrix(y, predictions, normalize=True)
plt.show()
效果图:

在这里插入图片描述

4、校准曲线

完整代码:
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import scikitplot as skplt
 #Set max_iter to a larger value. The default is 1000.
X, y = make_classification(n_samples=100000, n_features=20,
                           n_informative=2, n_redundant=2,
                           random_state=20)

X_train, y_train, X_test, y_test = X[:1000], y[:1000], X[1000:], y[1000:]

rf_probas = RandomForestClassifier().fit(X_train, y_train).predict_proba(X_test)
#lr_probas = LogisticRegression().fit(X_train, y_train).predict_proba(X_test)
lr_probas = LogisticRegression(max_iter=7600).fit(X_train, y_train).predict_proba(X_test)
nb_probas = GaussianNB().fit(X_train, y_train).predict_proba(X_test)
sv_scores = LinearSVC().fit(X_train, y_train).decision_function(X_test)

probas_list = [rf_probas, lr_probas, nb_probas, sv_scores]
clf_names=['Random Forest',
           'Logistic Regression',
           'Gaussian Naive Bayes',
           'Support Vector Machine']

skplt.metrics.plot_calibration_curve(y_test,
                                     probas_list=probas_list,
                                     clf_names=clf_names,
                                     n_bins=10)
plt.show()
效果图:

在这里插入图片描述

遇到的问题:
C:\Users\wu\AppData\Roaming\Python\Python37\site-packages\sklearn\svm\_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

在这里插入图片描述

解决方法:

LogisticRegression() 应该增加 max_iter

#lr_probas = LogisticRegression().fit(X_train, y_train).predict_proba(X_test)
lr_probas = LogisticRegression(max_iter=7600).fit(X_train, y_train).predict_proba(X_test)

其他方法可参考:

https://stackoverflow.com/questions/52670012/convergencewarning-liblinear-failed-to-converge-increase-the-number-of-iterati

在这里插入图片描述

5、plot_calibration_curve

from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
#lr = LogisticRegression()
lr = LogisticRegression(max_iter=7600)
lr.fit(X, y)
probas = lr.predict_proba(X)
skplt.metrics.plot_cumulative_gain(y_true=y, y_probas=probas)
plt.show()
效果图:

在这里插入图片描述

遇到的问题:

在这里插入图片描述

解决问题:

理由同上

#lr = LogisticRegression()
lr = LogisticRegression(max_iter=7600)

在这里插入图片描述

6、plot_silhouette

决定簇个数的肘部曲线

完整代码:
from __future__ import absolute_import
import matplotlib.pyplot as plt
import scikitplot as skplt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris as load_data

X, y = load_data(return_X_y=True)
kmeans = KMeans(random_state=1)
skplt.cluster.plot_elbow_curve(kmeans, X, cluster_ranges=range(1, 11))
plt.show()
效果图:

在这里插入图片描述

7、 plot_feature_importances

Scikit-Plot 中的 plot_feature_importances 函数可以将「特征重要性」排序并画出。

函数 plot_feature_importances用到的参数有 4 个:

  • RF:随机森林分类器
  • feature_names:特征名称,本例有 30 个
  • x_tick_rotation:横轴刻度旋转度,本例设置 90 度,因为特征多,名字长,不旋转 90 度图中显示非常乱
  • figsize:图片大小
完整代码:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris as load_data
import matplotlib.pyplot as plt
import scikitplot as skplt

X, y = load_data(return_X_y=True)
rf = RandomForestClassifier()
rf.fit(X, y)
skplt.estimators.plot_feature_importances(rf,feature_names=['petal length',
                                                         'petal width',
                                                         'sepal length',
                                                         'sepal width'])
plt.show()
效果图:

在这里插入图片描述

8、plot_ks_statistic

完整代码:
from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt


X, y = load_data(return_X_y=True)
lr = LogisticRegression(max_iter=7600)
lr.fit(X, y)
probas = lr.predict_proba(X)
skplt.metrics.plot_ks_statistic(y_true=y, y_probas=probas)
plt.show()
效果图:

在这里插入图片描述

9、 plot_learning_curve

完整代码:
from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
rf = RandomForestClassifier()
skplt.estimators.plot_learning_curve(rf, X, y)
plt.show()
效果图:

在这里插入图片描述

10、plot_lift_curve

完整代码
from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
lr = LogisticRegression(max_iter=7600)
lr.fit(X, y)
probas = lr.predict_proba(X)
skplt.metrics.plot_lift_curve(y_true=y, y_probas=probas)
plt.show()
效果图:

在这里插入图片描述

11、plot_pca_2d_projection

完整代码
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits as load_data
import scikitplot as skplt
import matplotlib.pyplot as plt

X, y = load_data(return_X_y=True)
pca = PCA(random_state=1)
pca.fit(X)
skplt.decomposition.plot_pca_2d_projection(pca, X, y)
plt.show()
效果图

在这里插入图片描述

12、plot_pca_component

完整代码:
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits as load_data
import scikitplot as skplt
import matplotlib.pyplot as plt

X, y = load_data(return_X_y=True)
pca = PCA(random_state=1)
pca.fit(X)
skplt.decomposition.plot_pca_component_variance(pca)
plt.show()
效果图

在这里插入图片描述

13、plot_silhouette

完整代码
from __future__ import absolute_import
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris as load_data
import scikitplot as skplt

X, y = load_data(return_X_y=True)
kmeans = KMeans(n_clusters=4, random_state=1)
cluster_labels = kmeans.fit_predict(X)
skplt.metrics.plot_silhouette(X, cluster_labels)
plt.show()
效果图

在这里插入图片描述

  • 6
    点赞
  • 50
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

进阶媛小吴

规则简单易懂,粗暴却完美!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值