SHAPley作为一种强大的、较为严密的模型可解释方法
利用博弈论中的边际递减效应,通过去掉每个特征,计算剩余的贡献值
除了能反映特征的重要性,还可以特征正负影响力
shap库支持GBDT、XGBoost、CatBoost等多种树模型,但却不支持Adaboost
在利用SHAP解释 基于Adaboost的冠心病预测模型时报错(数据集见文末)
报错如下:shap.utils._exceptions.InvalidModelError: Model type not yet supported by TreeExplainer: <class 'sklearn.ensemble._weight_boosting.AdaBoostClassifier'>
解决方法:在文件.conda\envs\虚拟环境名\Lib\site-packages\shap\explainers\_tree.py
添加关于Adaboost的支持
开始参考腾讯云的回答2,添加代码;但仍报错,后又参考wangxiancao的博客
修改了第709行的Tree →SingleTree 成功运行!
正确添加代码如下:
# added begin
elif safe_isinstance(model, ["sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier"]):
assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
self.input_dtype = np.float32
scaling = 1.0 / len(model.estimators_) # output is average of trees
self.trees = [SingleTree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
self.objective = objective_name_map.get(model.base_estimator_.criterion,
None) # This line is done to get the decision criteria, for example gini.
self.tree_output = "probability" # This is the last line added
Summary Plot:绘制每个特征的全局重要性图
# 方法二 SHAP可视化解释 explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(x_train) # Summary Plot:绘制每个特征的全局重要性图 # shap_values[0]对应蓝色条形图bar shap_values对应正负样本蓝+粉色图 shap.summary_plot(shap_values, x_train, plot_type="bar", max_display=20) shap.summary_plot(shap_values[0], x_train, plot_type="bar", max_display=20)
Summary Plot绘制输出图像,分别对应以下两种:
数据集:Z-Alizadeh-Sani 可用于冠心病二分类,
数据集的扩展包含303名受试者(216患者+87健康)的记录,每位患者有56个特征
访问数据集地址:Z-Alizadeh Sani - UCI Machine Learning Repository
参考文章: