论文笔记——通过影响函数理解黑盒预测(Understanding Black-box Predictions via Influence Functions)

论文——Understanding Black-box Predictions via Influence Functions

1. 介绍

《Understanding Black-box Predictions via Influence Functions》

这篇paper是来自2017年的ICML best paper的,其背景在摘要部分已经写明,即为了解释黑盒预测

所谓黑盒预测,在深度学习中,一个深层次的神经网络,往往能得到更好的预测性能和泛化能力,对于神经网络的应用层来说,通过各种方法如修改模型结构,调整参数,改造激活函以及一些训练过程中的trick优化网络性能,而对于为什么会模型能够work,还需要等待理论的发展来支撑。

在本文中,使用影响函数(统计学方法)通过学习算法跟踪模型训练数据对预测的影响,从而确定对预测集(测试集)影响最大的训练点。

How can we explain the predictions of a black- box model? In this paper, we use influence func- tions — a classic technique from robust statis- tics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most respon- sible for a given prediction.

通过摘要可以看到,这篇paper是从数据点的角度,探究训练点对于测试集的影响。

第一章是Introduction,主要介绍一下机器学习黑盒系统的背景,

A key question often asked of machine learning systems is “Why did the system make this prediction?”

从第二章Approach开始介绍方法

2. 方法

一些定义

  1. 训练点定义
    即每个 z i z_i zi就是一个训练点, x x x为输入, y y y为输出,有 n n n个训练点。
    z 1 , . . . , z n , 其 中 : z i = ( x i , y i ) ∈ X × Y z_1, . . . , z_n,\\其中: z_i = (x_i, y_i) ∈ X × Y z1,...,zn,zi=(xi,yi)X×Y
    2.训练点与参数 θ \theta θ定义损失 l o s s loss loss
    对于每个点 z z z θ \theta θ,令
    L ( z , θ ) = l o s s L(z, \theta)=loss L(z,θ)=loss

θ \theta θ为需要学习的参数,如在线性回归中, y ^ i = x i 1 θ 1 + x i 2 θ 2 + . . . + x i m θ m \hat{y}_{i}=x_{i1}\theta_{1}+x_{i2}\theta_{2}+...+x_{im}\theta_{m} y^i=xi1θ1+xi2θ2+...+ximθm
即每个训练点 x i x_{i} xi的预测值 y ^ i \hat{y}_{i} y^i m m m 个 θ 个\theta θ参数和x的m个特征相乘求和。再与实际值 y i 通 过 损 失 函 数 y_{i}通过损失函数 yi l o s s _ f u n c ( y ^ i , y i ) 计 算 损 失 loss\_func(\hat{y}_{i}, y_{i})计算损失 loss_func(y^i,yi)。而在神经网络中 y ^ i \hat{y}_{i} y^i的表达式往往更复杂,但本质也是通过参数 θ \theta θ z z z计算,故使用通用符合 L ( z , θ ) = l o s s L(z, \theta)=loss L(z,θ)=loss

  1. 经验风险损失最小化下的参数 θ \theta θ
    通常我们认为线性回归在神经网络中,其损失达到最小的时候,即损失收敛,我们此时的参数 θ ^ \hat{\theta} θ^即为需要求的最优 θ \theta θ,定义如下。
    θ ^ = a r g min ⁡ θ ∈ Θ 1 n ∑ i = 1 n L ( z i , θ ) \hat{\theta} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta)} θ^=argθΘminn1i=1nL(zi,θ)

2.1 增重一个训练点

Upweighting a training point

由2.1中的关于训练点的定义,当从训练集删除一个训练点 z z z时,参数 θ ^ \hat{\theta} θ^变为 θ ^ − z \hat{\theta}_{-z} θ^z,此时参数的变化为: θ ^ − z − θ ^ \hat{\theta}_{-z}-\hat{\theta} θ^zθ^,而 θ ^ − z \hat{\theta}_{-z} θ^z定义为

θ ^ − z = a r g min ⁡ θ ∈ Θ 1 n ∑ z i ≠ z L ( z i , θ ) \hat{\theta}_{-z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{z_{i}\ne z}^{}L(z_{i},\theta)} θ^z=argθΘminn1zi=zL(zi,θ)

即删除一个训练点后,重新训练,并找出其损失函数收敛时候的 θ ^ − z \hat{\theta}_{-z} θ^z

而幸运的是,影响函数给了我们一个有效逼近.

Fortunately, influence functions give us an efficient approximation.

思想是计算 z z z的改变量,对 θ \theta θ的影响,假如对 z z z施加一个小的影响因子 ϵ \epsilon ϵ,新的参数即改变为 θ ^ ϵ , z \hat{\theta}_{\epsilon, z} θ^ϵ,z,定义如下
θ ^ ϵ , z = a r g min ⁡ θ ∈ Θ 1 n ∑ i = 1 n L ( z i , θ ) + ϵ L ( z , θ ^ ) \hat{\theta}_{\epsilon, z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta) } + \epsilon L(z, \hat{\theta}) θ^ϵ,z=argθΘminn1i=1nL(zi,θ)+ϵL(z,θ^)

在1982年的文献中,这种方式计算 z z z的改变量对于参数 θ ^ \hat{\theta} θ^ 的影响有解,如下

A classic result (Cook & Weisberg, 1982) tells us that the in- fluence of upweighting z on the parameters θ ^ \hat{\theta} θ^ is given

I u p , p a r a m s ( z ) = d θ ^ ϵ , z d ϵ ∣ ϵ = 0 = − H θ ^ − 1 ∇ θ L ( z , θ ) I_{up, params}(z) = \frac{\mathrm{d} \hat{\theta}_{\epsilon, z} } {\mathrm{d} \epsilon } \mid_{\epsilon = 0}= -H^{-1}_{\hat{\theta}}\nabla_{\theta}L(z,\theta) Iup,params(z)=dϵdθ^ϵ,zϵ=0=Hθ^1θL(z,θ)
其中 H θ ^ = ∑ i = 1 n ∇ θ 2 L ( z , θ ^ ) H_{\hat{\theta}}={\textstyle \sum_{i=1}^{n} \nabla^{2}_{\theta}L(z, \hat{\theta} )} Hθ^=i=1nθ2L(z,θ^)海森矩阵,并且假设其正定。

由于当 ϵ = − 1 n \epsilon = −\frac{1}{n} ϵ=n1时相当于将 z z z移除,可以线性逼近移除 z z z后的参数变化 θ ^ − z − θ ^ ≈ − 1 n I u p , p a r a m s ( z ) \hat{\theta}_{-z}-\hat{\theta} \approx −\frac{1}{n}I_{up, params}(z) θ^zθ^n1Iup,params(z)而不用重新训练模型

Since removing a point z z z is the same as upweighting it by ϵ = − 1 n \epsilon = −\frac{1}{n} ϵ=n1 , we can linearly approximate the parameter change due to removing z z z by computing θ ^ − z − θ ^ ≈ I u p , p a r a m s ( z ) \hat{\theta}_{-z}-\hat{\theta} \approx I_{up, params}(z) θ^zθ^Iup,params(z), without retraining the model.

之后,基于上述方法,作者提出:当更新训练点 z z z后,在测试集上的loss会改变多少。可以得到一个闭式的解如下,
I u p , l o s s ( z , z t e s t ) = d L ( z t e s t , θ ^ ϵ , z ) d ϵ ∣ ϵ = 0 = ∇ θ L ( z t e s t , θ ^ ) T d θ ^ ϵ , z d ϵ ∣ ϵ = 0 = − ∇ θ L ( z t e s t , θ ^ ) T H θ ^ − 1 ∇ θ L ( z , θ ^ ) \begin{aligned} I_{up, loss}(z, z_{test}) & = \frac{\mathrm{d} L(z_{test}, \hat{\theta}_{\epsilon, z} )}{\mathrm{d} \epsilon } \mid_{\epsilon = 0} \\& = \nabla_{\theta}L(z_{test},\hat{\theta})^{T} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z}}{\mathrm{d} \epsilon } \mid_{\epsilon = 0} \\&=-\nabla_{\theta}L(z_{test},\hat{\theta})^{T}H^{-1}_{\hat{\theta}}\nabla_{\theta}L(z_, \hat{\theta}) \end{aligned} Iup,loss(z,ztest)=dϵdL(ztest,θ^ϵ,z)ϵ=0=θL(ztest,θ^)Tdϵdθ^ϵ,zϵ=0=θL(ztest,θ^)THθ^1θL(z,θ^)

2.2 干扰一个训练输入

Perturbing a training input

作者通过反事实进一步研究细化影响的概念,若干扰模型的输入,则对于预测会发生什么变化。

Let us develop a finer-grained notion of influence by studying a different counterfactual: how would the model’s pre- dictions change if a training input were modified

对于一个训练点 z = ( x , y ) z=(x, y) z=(x,y),定义:
z δ = ( x + δ , y ) z_{\delta}=(x+\delta, y) zδ=(x+δ,y).

即对样本点施加干扰从 z → z δ z \to z_{\delta} zzδ,
θ ^ z δ , − z \hat{\theta}_{z_{\delta}, -z} θ^zδ,z 为训练点 z z z 替换为 z δ z_{\delta} zδ后,训练损失的最小经验风险,即损失收敛处的参数值。

相当于 z z z 替换为 z δ z_{\delta} zδ后,重新训练, θ ^ z δ , − z \hat{\theta}_{z_{\delta}, -z} θ^zδ,z 为重新训练后的参数。

即此时参数的改变为
θ ^ z δ , − z − θ \hat{\theta}_{z_{\delta}, -z}- \theta θ^zδ,zθ

为了逼近 θ ^ z δ , − z − θ \hat{\theta}_{z_{\delta}, -z}- \theta θ^zδ,zθ,定义从 z → z δ z \to z_{\delta} zzδ

θ ^ ϵ , z δ , − z = a r g min ⁡ θ ∈ Θ 1 n ∑ i = 1 n L ( z i , θ ) + ϵ L ( z δ , θ ) − ϵ L ( z , θ ) \hat{\theta}_{\epsilon, z_{\delta},-z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta) } + \epsilon L(z_{\delta}, \theta)-\epsilon L(z, \theta) θ^ϵ,zδ,z=argθΘminn1i=1nL(zi,θ)+ϵL(zδ,θ)ϵL(z,θ)

得到:
d θ ^ ϵ , z δ , − z d ϵ ∣ ϵ = 0 = I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) = − H θ ^ − 1 ( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) \begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \end{aligned} dϵdθ^ϵ,zδ,zϵ=0=Iup,params(zδ)Iup,params(z)=Hθ^1(θL(zδ,θ)θL(z,θ^))

因此,同样有 θ ^ z δ , − z − θ ≈ − 1 n ( I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) ) \hat{\theta}_{z_{\delta}, -z}- \theta \approx -\frac{1}{n} ( I_{up, params}(z_{\delta})- I_{up, params}(z)) θ^zδ,zθn1(Iup,params(zδ)Iup,params(z)),

给出了从 z → z δ z \to z_{\delta} zzδ的一个影响估计值。
上诉例子中, δ \delta δ 为施加于x输入中,即 ( x , y ) → ( x + δ , y ) (x, y) \to (x+{\delta}, y) (x,y)(x+δ,y)
同样的结论适用于 y y y的扰动, ( x , y ) → ( x , y + δ ) (x, y) \to (x, y+{\delta}) (x,y)(x,y+δ)

Analogous equations also apply for changes in y.

虽然影响函数似乎只适用于无穷小(因此是连续的)扰动,但需要注意的是,这种近似适用于任意 δ \delta δ ϵ \epsilon ϵ-更新方法允许在 z z z z δ z_δ zδ之间进行平滑插值。这对于离散数据(例如,在NLP中)或离散标签的处理特别有用。

While influence functions might appear to only work for infinitesimal (therefore continuous) perturbations, it is important to note that this approximation holds for arbitrary δ \delta δ: the ϵ \epsilon ϵ-upweighting scheme allows us to smoothly interpolate between z z z and z δ z_δ zδ. This is particularly useful for working with discrete data (e.g., in NLP) or with discrete label changes.

如果 x x x是连续且小,以下

d θ ^ ϵ , z δ , − z d ϵ ∣ ϵ = 0 = I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) = − H θ ^ − 1 ( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) \begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \end{aligned} dϵdθ^ϵ,zδ,zϵ=0=Iup,params(zδ)Iup,params(z)=Hθ^1(θL(zδ,θ)θL(z,θ^))

可以得到一个进一步的逼近.
假定x的输入域 χ ∈ R d \chi \in \mathbb{R} ^d χRd, 参数域 Θ ∈ R d \Theta \in \mathbb{R} ^d ΘRd, L L L对于 θ \theta θ x x x可微,
∥ δ ∥ → 0 \left \| \delta \right \| \to 0 δ0
( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) ≈ ( ∇ x ∇ θ L ( z , θ ^ ) ) δ (\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \approx (\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta (θL(zδ,θ)θL(z,θ^))(xθL(z,θ^))δ

代入原式中,得
d θ ^ ϵ , z δ , − z d ϵ ∣ ϵ = 0 = I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) = − H θ ^ − 1 ( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) = − H θ ^ − 1 ( ∇ x ∇ θ L ( z , θ ^ ) ) δ \begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \\&= -H^{-1}_{\hat{\theta}}(\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta \end{aligned} dϵdθ^ϵ,zδ,zϵ=0=Iup,params(zδ)Iup,params(z)=Hθ^1(θL(zδ,θ)θL(z,θ^))=Hθ^1(xθL(z,θ^))δ

因此 θ ^ z δ , − z − θ ≈ − 1 n H θ ^ − 1 ( ∇ x ∇ θ L ( z , θ ^ ) ) δ \hat{\theta}_{z_{\delta}, -z}- \theta \approx -\frac{1}{n} H^{-1}_{\hat{\theta}}(\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta θ^zδ,zθn1Hθ^1(xθL(z,θ^))δ

使用链式法则对 δ \delta δ微分得到:

I p e r t , l o s s ( z , z t e s t ) T = ∇ δ L ( z t e s t , θ ^ z δ , − z ) T ∣ δ = 0 = − ∇ θ L ( z t e s t , θ ^ ) T H θ ^ − 1 ∇ x ∇ θ L ( z , θ ^ ) \begin{aligned} I_{pert, loss}(z, z_{test})^T & = \nabla_{\delta}L(z_{test},\hat{\theta}_{z_{\delta},-z})^{T} \mid_{\delta= 0} \\&=-\nabla_{\theta}L(z_{test},\hat{\theta})^{T}H^{-1}_{\hat{\theta}}\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}) \end{aligned} Ipert,loss(z,ztest)T=δL(ztest,θ^zδ,z)Tδ=0=θL(ztest,θ^)THθ^1xθL(z,θ^)

I p e r t , l o s s ( z , z t e s t ) T δ I_{pert, loss}(z, z_{test})^T\delta Ipert,loss(z,ztest)Tδ z → z + δ z \to z+{\delta} zz+δ 的有效逼近

通过设置 δ \delta δ可以建立在训练集上,对测试集影响最大的局部扰动

评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值