论文——Understanding Black-box Predictions via Influence Functions
1. 介绍
《Understanding Black-box Predictions via Influence Functions》
这篇paper是来自2017年的ICML best paper的,其背景在摘要部分已经写明,即为了解释黑盒预测。
所谓黑盒预测,在深度学习中,一个深层次的神经网络,往往能得到更好的预测性能和泛化能力,对于神经网络的应用层来说,通过各种方法如修改模型结构,调整参数,改造激活函以及一些训练过程中的trick优化网络性能,而对于为什么会模型能够work,还需要等待理论的发展来支撑。
在本文中,使用影响函数(统计学方法)通过学习算法跟踪模型训练数据对预测的影响,从而确定对预测集(测试集)影响最大的训练点。
How can we explain the predictions of a black- box model? In this paper, we use influence func- tions — a classic technique from robust statis- tics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most respon- sible for a given prediction.
通过摘要可以看到,这篇paper是从数据点的角度,探究训练点对于测试集的影响。
第一章是Introduction,主要介绍一下机器学习黑盒系统的背景,
A key question often asked of machine learning systems is “Why did the system make this prediction?”
从第二章Approach开始介绍方法
2. 方法
一些定义
- 训练点定义
即每个 z i z_i zi就是一个训练点, x x x为输入, y y y为输出,有 n n n个训练点。
z 1 , . . . , z n , 其 中 : z i = ( x i , y i ) ∈ X × Y z_1, . . . , z_n,\\其中: z_i = (x_i, y_i) ∈ X × Y z1,...,zn,其中:zi=(xi,yi)∈X×Y
2.训练点与参数 θ \theta θ定义损失 l o s s loss loss
对于每个点 z z z和 θ \theta θ,令
L ( z , θ ) = l o s s L(z, \theta)=loss L(z,θ)=loss
θ \theta θ为需要学习的参数,如在线性回归中, y ^ i = x i 1 θ 1 + x i 2 θ 2 + . . . + x i m θ m \hat{y}_{i}=x_{i1}\theta_{1}+x_{i2}\theta_{2}+...+x_{im}\theta_{m} y^i=xi1θ1+xi2θ2+...+ximθm
即每个训练点 x i x_{i} xi的预测值 y ^ i \hat{y}_{i} y^i由 m m m 个 θ 个\theta 个θ参数和x的m个特征相乘求和。再与实际值 y i 通 过 损 失 函 数 y_{i}通过损失函数 yi通过损失函数 l o s s _ f u n c ( y ^ i , y i ) 计 算 损 失 loss\_func(\hat{y}_{i}, y_{i})计算损失 loss_func(y^i,yi)计算损失。而在神经网络中 y ^ i \hat{y}_{i} y^i的表达式往往更复杂,但本质也是通过参数 θ \theta θ和 z z z计算,故使用通用符合 L ( z , θ ) = l o s s L(z, \theta)=loss L(z,θ)=loss
- 经验风险损失最小化下的参数
θ
\theta
θ
通常我们认为线性回归在神经网络中,其损失达到最小的时候,即损失收敛,我们此时的参数 θ ^ \hat{\theta} θ^即为需要求的最优 θ \theta θ,定义如下。
θ ^ = a r g min θ ∈ Θ 1 n ∑ i = 1 n L ( z i , θ ) \hat{\theta} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta)} θ^=argθ∈Θminn1∑i=1nL(zi,θ)
2.1 增重一个训练点
Upweighting a training point
由2.1中的关于训练点的定义,当从训练集删除一个训练点 z z z时,参数 θ ^ \hat{\theta} θ^变为 θ ^ − z \hat{\theta}_{-z} θ^−z,此时参数的变化为: θ ^ − z − θ ^ \hat{\theta}_{-z}-\hat{\theta} θ^−z−θ^,而 θ ^ − z \hat{\theta}_{-z} θ^−z定义为
θ ^ − z = a r g min θ ∈ Θ 1 n ∑ z i ≠ z L ( z i , θ ) \hat{\theta}_{-z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{z_{i}\ne z}^{}L(z_{i},\theta)} θ^−z=argθ∈Θminn1∑zi=zL(zi,θ)
即删除一个训练点后,重新训练,并找出其损失函数收敛时候的 θ ^ − z \hat{\theta}_{-z} θ^−z
而幸运的是,影响函数给了我们一个有效逼近.
Fortunately, influence functions give us an efficient approximation.
思想是计算
z
z
z的改变量,对
θ
\theta
θ的影响,假如对
z
z
z施加一个小的影响因子
ϵ
\epsilon
ϵ,新的参数即改变为
θ
^
ϵ
,
z
\hat{\theta}_{\epsilon, z}
θ^ϵ,z,定义如下
θ
^
ϵ
,
z
=
a
r
g
min
θ
∈
Θ
1
n
∑
i
=
1
n
L
(
z
i
,
θ
)
+
ϵ
L
(
z
,
θ
^
)
\hat{\theta}_{\epsilon, z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta) } + \epsilon L(z, \hat{\theta})
θ^ϵ,z=argθ∈Θminn1∑i=1nL(zi,θ)+ϵL(z,θ^)
在1982年的文献中,这种方式计算 z z z的改变量对于参数 θ ^ \hat{\theta} θ^ 的影响有解,如下
A classic result (Cook & Weisberg, 1982) tells us that the in- fluence of upweighting z on the parameters θ ^ \hat{\theta} θ^ is given
I
u
p
,
p
a
r
a
m
s
(
z
)
=
d
θ
^
ϵ
,
z
d
ϵ
∣
ϵ
=
0
=
−
H
θ
^
−
1
∇
θ
L
(
z
,
θ
)
I_{up, params}(z) = \frac{\mathrm{d} \hat{\theta}_{\epsilon, z} } {\mathrm{d} \epsilon } \mid_{\epsilon = 0}= -H^{-1}_{\hat{\theta}}\nabla_{\theta}L(z,\theta)
Iup,params(z)=dϵdθ^ϵ,z∣ϵ=0=−Hθ^−1∇θL(z,θ)
其中
H
θ
^
=
∑
i
=
1
n
∇
θ
2
L
(
z
,
θ
^
)
H_{\hat{\theta}}={\textstyle \sum_{i=1}^{n} \nabla^{2}_{\theta}L(z, \hat{\theta} )}
Hθ^=∑i=1n∇θ2L(z,θ^)为海森矩阵,并且假设其正定。
由于当 ϵ = − 1 n \epsilon = −\frac{1}{n} ϵ=−n1时相当于将 z z z移除,可以线性逼近移除 z z z后的参数变化 θ ^ − z − θ ^ ≈ − 1 n I u p , p a r a m s ( z ) \hat{\theta}_{-z}-\hat{\theta} \approx −\frac{1}{n}I_{up, params}(z) θ^−z−θ^≈−n1Iup,params(z),而不用重新训练模型。
Since removing a point z z z is the same as upweighting it by ϵ = − 1 n \epsilon = −\frac{1}{n} ϵ=−n1 , we can linearly approximate the parameter change due to removing z z z by computing θ ^ − z − θ ^ ≈ I u p , p a r a m s ( z ) \hat{\theta}_{-z}-\hat{\theta} \approx I_{up, params}(z) θ^−z−θ^≈Iup,params(z), without retraining the model.
之后,基于上述方法,作者提出:当更新训练点
z
z
z后,在测试集上的loss会改变多少。可以得到一个闭式的解如下,
I
u
p
,
l
o
s
s
(
z
,
z
t
e
s
t
)
=
d
L
(
z
t
e
s
t
,
θ
^
ϵ
,
z
)
d
ϵ
∣
ϵ
=
0
=
∇
θ
L
(
z
t
e
s
t
,
θ
^
)
T
d
θ
^
ϵ
,
z
d
ϵ
∣
ϵ
=
0
=
−
∇
θ
L
(
z
t
e
s
t
,
θ
^
)
T
H
θ
^
−
1
∇
θ
L
(
z
,
θ
^
)
\begin{aligned} I_{up, loss}(z, z_{test}) & = \frac{\mathrm{d} L(z_{test}, \hat{\theta}_{\epsilon, z} )}{\mathrm{d} \epsilon } \mid_{\epsilon = 0} \\& = \nabla_{\theta}L(z_{test},\hat{\theta})^{T} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z}}{\mathrm{d} \epsilon } \mid_{\epsilon = 0} \\&=-\nabla_{\theta}L(z_{test},\hat{\theta})^{T}H^{-1}_{\hat{\theta}}\nabla_{\theta}L(z_, \hat{\theta}) \end{aligned}
Iup,loss(z,ztest)=dϵdL(ztest,θ^ϵ,z)∣ϵ=0=∇θL(ztest,θ^)Tdϵdθ^ϵ,z∣ϵ=0=−∇θL(ztest,θ^)THθ^−1∇θL(z,θ^)
2.2 干扰一个训练输入
Perturbing a training input
作者通过反事实进一步研究细化影响的概念,若干扰模型的输入,则对于预测会发生什么变化。
Let us develop a finer-grained notion of influence by studying a different counterfactual: how would the model’s pre- dictions change if a training input were modified
对于一个训练点
z
=
(
x
,
y
)
z=(x, y)
z=(x,y),定义:
z
δ
=
(
x
+
δ
,
y
)
z_{\delta}=(x+\delta, y)
zδ=(x+δ,y).
即对样本点施加干扰从
z
→
z
δ
z \to z_{\delta}
z→zδ,
令
θ
^
z
δ
,
−
z
\hat{\theta}_{z_{\delta}, -z}
θ^zδ,−z 为训练点
z
z
z 替换为
z
δ
z_{\delta}
zδ后,训练损失的最小经验风险,即损失收敛处的参数值。
相当于 z z z 替换为 z δ z_{\delta} zδ后,重新训练, θ ^ z δ , − z \hat{\theta}_{z_{\delta}, -z} θ^zδ,−z 为重新训练后的参数。
即此时参数的改变为
θ
^
z
δ
,
−
z
−
θ
\hat{\theta}_{z_{\delta}, -z}- \theta
θ^zδ,−z−θ
为了逼近 θ ^ z δ , − z − θ \hat{\theta}_{z_{\delta}, -z}- \theta θ^zδ,−z−θ,定义从 z → z δ z \to z_{\delta} z→zδ:
θ ^ ϵ , z δ , − z = a r g min θ ∈ Θ 1 n ∑ i = 1 n L ( z i , θ ) + ϵ L ( z δ , θ ) − ϵ L ( z , θ ) \hat{\theta}_{\epsilon, z_{\delta},-z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta) } + \epsilon L(z_{\delta}, \theta)-\epsilon L(z, \theta) θ^ϵ,zδ,−z=argθ∈Θminn1∑i=1nL(zi,θ)+ϵL(zδ,θ)−ϵL(z,θ)
得到:
d
θ
^
ϵ
,
z
δ
,
−
z
d
ϵ
∣
ϵ
=
0
=
I
u
p
,
p
a
r
a
m
s
(
z
δ
)
−
I
u
p
,
p
a
r
a
m
s
(
z
)
=
−
H
θ
^
−
1
(
∇
θ
L
(
z
δ
,
θ
)
−
∇
θ
L
(
z
,
θ
^
)
)
\begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \end{aligned}
dϵdθ^ϵ,zδ,−z∣ϵ=0=Iup,params(zδ)−Iup,params(z)=−Hθ^−1(∇θL(zδ,θ)−∇θL(z,θ^))
因此,同样有 θ ^ z δ , − z − θ ≈ − 1 n ( I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) ) \hat{\theta}_{z_{\delta}, -z}- \theta \approx -\frac{1}{n} ( I_{up, params}(z_{\delta})- I_{up, params}(z)) θ^zδ,−z−θ≈−n1(Iup,params(zδ)−Iup,params(z)),
给出了从
z
→
z
δ
z \to z_{\delta}
z→zδ的一个影响估计值。
上诉例子中,
δ
\delta
δ 为施加于x输入中,即
(
x
,
y
)
→
(
x
+
δ
,
y
)
(x, y) \to (x+{\delta}, y)
(x,y)→(x+δ,y),
同样的结论适用于
y
y
y的扰动,
(
x
,
y
)
→
(
x
,
y
+
δ
)
(x, y) \to (x, y+{\delta})
(x,y)→(x,y+δ)
Analogous equations also apply for changes in y.
虽然影响函数似乎只适用于无穷小(因此是连续的)扰动,但需要注意的是,这种近似适用于任意 δ \delta δ: ϵ \epsilon ϵ-更新方法允许在 z z z和 z δ z_δ zδ之间进行平滑插值。这对于离散数据(例如,在NLP中)或离散标签的处理特别有用。
While influence functions might appear to only work for infinitesimal (therefore continuous) perturbations, it is important to note that this approximation holds for arbitrary δ \delta δ: the ϵ \epsilon ϵ-upweighting scheme allows us to smoothly interpolate between z z z and z δ z_δ zδ. This is particularly useful for working with discrete data (e.g., in NLP) or with discrete label changes.
如果 x x x是连续且小,以下
d θ ^ ϵ , z δ , − z d ϵ ∣ ϵ = 0 = I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) = − H θ ^ − 1 ( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) \begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \end{aligned} dϵdθ^ϵ,zδ,−z∣ϵ=0=Iup,params(zδ)−Iup,params(z)=−Hθ^−1(∇θL(zδ,θ)−∇θL(z,θ^))
可以得到一个进一步的逼近.
假定x的输入域
χ
∈
R
d
\chi \in \mathbb{R} ^d
χ∈Rd, 参数域
Θ
∈
R
d
\Theta \in \mathbb{R} ^d
Θ∈Rd,
L
L
L对于
θ
\theta
θ和
x
x
x可微,
当
∥
δ
∥
→
0
\left \| \delta \right \| \to 0
∥δ∥→0时
(
∇
θ
L
(
z
δ
,
θ
)
−
∇
θ
L
(
z
,
θ
^
)
)
≈
(
∇
x
∇
θ
L
(
z
,
θ
^
)
)
δ
(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \approx (\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta
(∇θL(zδ,θ)−∇θL(z,θ^))≈(∇x∇θL(z,θ^))δ
代入原式中,得
d
θ
^
ϵ
,
z
δ
,
−
z
d
ϵ
∣
ϵ
=
0
=
I
u
p
,
p
a
r
a
m
s
(
z
δ
)
−
I
u
p
,
p
a
r
a
m
s
(
z
)
=
−
H
θ
^
−
1
(
∇
θ
L
(
z
δ
,
θ
)
−
∇
θ
L
(
z
,
θ
^
)
)
=
−
H
θ
^
−
1
(
∇
x
∇
θ
L
(
z
,
θ
^
)
)
δ
\begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \\&= -H^{-1}_{\hat{\theta}}(\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta \end{aligned}
dϵdθ^ϵ,zδ,−z∣ϵ=0=Iup,params(zδ)−Iup,params(z)=−Hθ^−1(∇θL(zδ,θ)−∇θL(z,θ^))=−Hθ^−1(∇x∇θL(z,θ^))δ
因此 θ ^ z δ , − z − θ ≈ − 1 n H θ ^ − 1 ( ∇ x ∇ θ L ( z , θ ^ ) ) δ \hat{\theta}_{z_{\delta}, -z}- \theta \approx -\frac{1}{n} H^{-1}_{\hat{\theta}}(\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta θ^zδ,−z−θ≈−n1Hθ^−1(∇x∇θL(z,θ^))δ
使用链式法则对 δ \delta δ微分得到:
I p e r t , l o s s ( z , z t e s t ) T = ∇ δ L ( z t e s t , θ ^ z δ , − z ) T ∣ δ = 0 = − ∇ θ L ( z t e s t , θ ^ ) T H θ ^ − 1 ∇ x ∇ θ L ( z , θ ^ ) \begin{aligned} I_{pert, loss}(z, z_{test})^T & = \nabla_{\delta}L(z_{test},\hat{\theta}_{z_{\delta},-z})^{T} \mid_{\delta= 0} \\&=-\nabla_{\theta}L(z_{test},\hat{\theta})^{T}H^{-1}_{\hat{\theta}}\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}) \end{aligned} Ipert,loss(z,ztest)T=∇δL(ztest,θ^zδ,−z)T∣δ=0=−∇θL(ztest,θ^)THθ^−1∇x∇θL(z,θ^)
I p e r t , l o s s ( z , z t e s t ) T δ I_{pert, loss}(z, z_{test})^T\delta Ipert,loss(z,ztest)Tδ 为 z → z + δ z \to z+{\delta} z→z+δ 的有效逼近
通过设置 δ \delta δ可以建立在训练集上,对测试集影响最大的局部扰动