Machine Learning Explainability（1）

qq_42839893

于 2021-01-29 17:22:30 发布

阅读量135

点赞数

分类专栏： kaggle_course

本文链接：https://blog.csdn.net/qq_42839893/article/details/113395900

版权

kaggle_course 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

PI_kaggle

Machine Learning Explainability（1）

Why Are These Insights Valuable
example:
- !!!Permutation Importance code:
- PI方法不受量纲影响

Why Are These Insights Valuable

These insights have many uses, including

Debugging
Informing feature engineering
Directing future data collection
Informing human decision-making
Building Trust

在这里插入图片描述

example:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

data = pd.read_csv('../input/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
y = (data['Man of the Match'] == "Yes")  # Convert from string "Yes"/"No" to binary
feature_names = [i for i in data.columns if data[i].dtype in [np.int64]]
X = data[feature_names]
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
my_model = RandomForestClassifier(n_estimators=100,
                                  random_state=0).fit(train_X, train_y)

!!!Permutation Importance code:

import eli5
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(my_model , random_state = 1 ).fit(val_X, val_y)

eli5.show_weights(perm , feature_names = val_X.columns.tolist())

在这里插入图片描述
The values towards the top are the most important features, and those towards the bottom matter least.
The first number in each row shows how much model performance decreased with a random shuffling (in this case, using “accuracy” as the performance metric).
Like most things in data science, there is some randomness to the exact performance change from a shuffling a column. We measure the amount of randomness in our permutation importance calculation by repeating the process with multiple shuffles. The number after the ± measures how performance varied from one-reshuffling to the next.

PI方法不受量纲影响

The scale of features does not affect permutation importance per se. The only reason that rescaling a feature would affect PI is indirectly, if rescaling helped or hurt the ability of the particular learning method we’re using to make use of that feature. That won’t happen with tree based models, like the Random Forest used here. If you are familiar with Ridge Regression, you might be able to think of how that would be affected. That said, the absolute change features are have high importance because they capture total distance traveled, which is the primary determinant of taxi fares…It is not an artifact of the feature magnitude.

qq_42839893

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning Explainability（1）

PI_kaggleMachine Learning Explainability（1）Why Are These Insights Valuableexample:!!!Permutation Importance code:PI方法不受量纲影响Why Are These Insights ValuableThese insights have many uses, includingDebuggingInforming feature engineeringDirecting future
复制链接

扫一扫