关于模型可解释性涉及到模型的整体可解释性和单个实例的可解释性。这里着重强调的是实例级别(instance-level)的可解释性。
解释方法分为自带可解释性的模型和模型无关的方法。本节主要介绍自带可解释性的模型。主要包含:
Monotone:是否具有单调性,即feature和target是否单调
Interation:是否自带特征交叉
Linear Regression
y = β0 + β1x1 + . . . + βpxp + ε,epsilon是误差,服从高斯分布
(1)前提假设
Linearity:输出是输入特征的线性组合
Normality:给定特征的目标结果服从正态分布
Homoscedasticity:误差项的方差在整个特征空间上是常数
Independence:样本点独立
Fixed features:固定意味着输入特征被视为给定的常数,而不是统计变量,也就没有测量误差
Absence of multicollinearity:无共线性特征
(2)R-squared
R2 =1−SSE/SST
R- squared tells you how much of the total variance of your target outcome is explained by the model.
0: 模型无法解释数据
1: 模型能完全解释数据
(3)Feature Importance
The importance of a feature in a linear regression model can be measured by the absolute value of its t-statistic. The t-statistic is the estimated weight scaled with its standard error.
(4)Visual Interpretation
4-1 Weight Plot
模型特征权重
Weights are displayed as points and the 95% confidence intervals as lines.
4-2 Effect Plot
effect(i) = w x(i),The weights of the linear regression model can be more meaningfully analyzed when they are multiplied by the actual feature values. feature值乘以weight即为effect value。
(5)稀疏线性模型
Lasso:Lasso stands for “least absolute shrinkage and selection operator” and, when applied in a linear regression model, performs feature selection and regularization of the selected feature weights.