【机器学习】机器学习可视化利器--Yellowbrick

  • 本文分享机器学习工具Scikit-Learn强力扩展yellowbrick

  • 通过几行代码可视化特征值、模型、模型评估等,帮助更便捷的的选择机器学习模型和调参,依赖Matplotlib和Scikit-Learn。

本文目录

947663d89cd6a204e56cfbb84fc95aa0.png

yellowbrick安装

# 清华源加速安装
pip install yellowbrick -i https://pypi.tuna.tsinghua.edu.cn/simple

yellowbrick核心“武器” - Visualizers

Visualizers可以理解为一个scikit-learn的估计器(estimator)对象,但是附加了可视化的属性,使用过程与使用scikit-learn模型类似:

  • 导入特定的visualizers;

  • 实例化visualizers;

  • 拟合visualizers;

  • 可视化展示。


yellowbrick实例快速上手

  • 展示ROC曲线,评估不同模型效果

import matplotlib.pyplot as plt

plt.figure(dpi=120)
from sklearn.linear_model import RidgeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OrdinalEncoder, LabelEncoder

from yellowbrick.classifier import ROCAUC
from yellowbrick.datasets import load_game

# 导入数据
X, y = load_game()

# 数据转换
X = OrdinalEncoder().fit_transform(X)
y = LabelEncoder().fit_transform(y)

# 构建测试集和训练集
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# 实例化分类模型和visualizer
model = RidgeClassifier()
visualizer = ROCAUC(model, classes=["win", "loss", "draw"])

visualizer.fit(X_train, y_train)  # 拟合visualizer
visualizer.score(X_test, y_test)  # 评价模型在训练集上效果
visualizer.show()
9d53026b1d879cedfd2c3df4f1d88bb9.png
  • 特征工程中,展示PCA降维效果

import matplotlib.pyplot as plt

plt.figure(dpi=120)
from yellowbrick.features import PCA

X, y = load_credit()
classes = ['account in default', 'current with bills']

visualizer = PCA(scale=True, projection=3, classes=classes)
visualizer.fit_transform(X, y)
visualizer.show()
2a4e664a8ecc2650a48b15151e776f45.png
  • 回归模型中,展示预测值和真实值之间的残差,Q-Q plot评估模型效果。

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

from yellowbrick.datasets import load_concrete
from yellowbrick.regressor import ResidualsPlot

# 导入数据
X, y = load_concrete()

# 构建训练集、测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 实例化模型和visualizer
model = Ridge()
visualizer = ResidualsPlot(model, hist=False, qqplot=True)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()
c20fea6cb18f147c5bc0d4ebea38e737.png
Residuals Plot on the Concrete dataset with a Q-Q plot
  • 展示Lasso回归模型效果

import matplotlib.pyplot as plt

plt.figure(dpi=120)
from sklearn.linear_model import Lasso
from yellowbrick.datasets import load_bikeshare
from yellowbrick.regressor import prediction_error


X, y = load_bikeshare()
visualizer = prediction_error(Lasso(), X, y)#一行代码即可展示,方不方便

3d8b494fc355b87266ebc44a315abeb7.png更多实例见下一节~~


yellowbrick常用的Visualizers

特征展示(Feature Visualization)

  • Rank Features: pairwise ranking of features to detect relationships

  • Parallel Coordinates: horizontal visualization of instances

  • Radial Visualization: separation of instances around a circular plot

  • PCA Projection: projection of instances based on principal components

  • Manifold Visualization: high dimensional visualization with manifold learning

  • Joint Plots: direct data visualization with feature selectione54123903ddde0e4191ee9b75dc043e1.png

分类模型展示(Classification Visualization)

  • Class Prediction Error: shows error and support in classification

  • Classification Report: visual representation of precision, recall, and F1

  • ROC/AUC Curves: receiver operator characteristics and area under the curve

  • Precision-Recall Curves: precision vs recall for different probability thresholds

  • Confusion Matrices: visual description of class decision making

  • Discrimination Threshold: find a threshold that best separates binary classes42ada90ff9e444e7eb62922382295cbb.png

回归模型展示(Regression Visualization)

  • Prediction Error Plot: find model breakdowns along the domain of the target

  • Residuals Plot: show the difference in residuals of training and test data

  • Alpha Selection: show how the choice of alpha influences regularization

  • Cook’s Distance: show the influence of instances on linear regression5159f2d3b597b951b9475e05dd0114c2.png

聚类模型展示(Clustering Visualization)

  • K-Elbow Plot: select k using the elbow method and various metrics

  • Silhouette Plot: select k by visualizing silhouette coefficient values

  • Intercluster Distance Maps: show relative distance and size/importance of clusters9551f996c2d3bb4bdd7689bad23ead3a.png

模型选择(Model Selection Visualization)

  • Validation Curve: tune a model with respect to a single hyperparameter

  • Learning Curve: show if a model might benefit from more data or less complexity

  • Feature Importances: rank features by importance or linear coefficients for a specific model

  • Recursive Feature Elimination: find the best subset of features based on importance1329e1999f2ea764ff0ac3db7d2a4e3d.png

目标展示(Target Visualization)

  • Balanced Binning Reference: generate a histogram with vertical lines showing the recommended value point to bin the data into evenly distributed bins

  • Class Balance: see how the distribution of classes affects the model

  • Feature Correlation: display the correlation between features and dependent variables086a7efca53ea20a3b4e4f966a9c279b.png

文本展示(Text Visualization)

  • Term Frequency: visualize the frequency distribution of terms in the corpus

  • t-SNE Corpus Visualization: use stochastic neighbor embedding to project documents

  • Dispersion Plot: visualize how key terms are dispersed throughout a corpus

  • UMAP Corpus Visualization: plot similar documents closer together to discover clusters

  • PosTag Visualization: plot the counts of different parts-of-speech throughout a tagged corpus83b98b5f8871bb9ce763812a8c845347.png


yellowbrick图形个性化设置

https://www.scikit-yb.org/en/latest/index.html


-END-

👇点击阅读更多👇

Python可视化

Python入门及提高

统计学入门及提高

R可视化

关注「pythonic生物人」

9d167ea1b7909a31599f937c220d856e.png

加微信群:扫码备注学习

c7f5b1d21d770cff658bd94460723dcb.png

日拱一卒无有尽,功不唐捐终入海!「❤️动手点个在看,下期见~」       
  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值