Table of Content
Shapley Values
Definition
从同事那里了解到Python有个机器学习相关的库-SHAP,可以用来做模型特征重要性相关的探索。因此花点时间了解一下SHAP。既然提到SHAP,那么肯定要了解一下其背后的Shapley Value。
从Interpretable machine learning 5.10 SHAP找到一段相关的Shapley Values的定义。
A prediction can be explained by assuming that each feature value of the instance is a “player” in a game where the prediction is the payout. Shapley values – a method from coalitional game theory – tells us how to fairly distribute the “payout” among the features.
将整个模型训练作为一局博弈,每个特征当做一个博弈者,而预测结果则是最终的结果。Shapley Values是一种用来把最终的结果分摊到每个博弈者身上的方法。换句话说,也就是用来判断每个特征对预测结果做出了多大的贡献。
顺带对原文中提到的coalitional game theory了解了一下。Coalitional game theory,或Cooperative game theory,中文直译是合作博弈论,又称正和博弈。是指由于受到可能的外力原因,博弈者们组成联盟的形式,在集团之间的博弈对抗。这与传统的博弈论不同,传统博弈论中博弈者通常被假设为所有决策都是自我主导的,对盟友不存在原谅的可能性的(用人话说,博弈者不可能形成联盟或任何合作机制都必须为自我履约契约)。
回到Shapley Values这个话题,乍一看,Shapley Values的概念不就和Linear Model一个意思嘛?通过每个特征的系数来直观判断特征重要性。但是Linear Model的概念仅适用于简单模型,如果涉及到LGBM之类的复杂模型,就无法适用了。
Example for General Idea
原文中提到一个很有意思的例子。训练一个模型用来预测房价,相关的变量有park,cat,area,floor。如果需要计算cat(banned or allowed)对house price的影响,那么则只需要抽取其余变量相同,只有cat变量不同的样本,计算样本平均贡献-average marginal contribution,即可知道cat变量对house price的影响。 但是这样的计算会存在一个问题,为了达到控制变量的目的,应当对每个变量的所有可能值做排列组合.
Axioms of Shapley Value
Symmetry
i and j are interchangeable relative to v if they always contribute the same amount to every coalition of the other agents.
Which is,
for all S that contains neither i nor j, v ( S ∪ { i } ) = v ( S ∪ { j } ) . v(S\cup\{i\}) = v(S\cup\{j\}). v(S∪{
i})=v(S∪{
j}).
In addition, they should receive the exact same shares/payments.
That is,
for any v, if i and j are interchangeable, then ψ i ( N , v ) = ψ j ( N , v ) \psi_i(N,v) = \psi_j(N,v) ψi(N,v)=ψj(N,v)
Dummy Players(free rider)
i is a dummy player if the amount that i contributes to any coalition is 0.
Which is,
for all S : v ( S ∪ { i } ) = v ( S ) . S: v(S \cup \{i\}) = v(S). S:v(S∪{
i})=v(S).
Dummy players should receive nothing.
That is,
for any v, if i is a dummy player, then ψ i ( N , v ) = 0 \psi_i(N,v) = 0 ψi(N,v)=0
Additivity
If we can separate a game into two parts, v = v 1 + v 2 v = v_1 + v_2 v=v1+v2, then we should be able to decompose the payments.
For any two v i v_i vi, and v 2 , ψ i ( N , v 1 + v 2 ) = ψ i ( N , v 1 ) + ψ ( N , v 2 ) v_2,\;\psi_i(N, v_1 + v_2) = \psi_i(N, v_1) + \psi(N, v_2) v2,ψi(N,v