IMPROVING ADVERSARIAL ROBUSTNESS REQUIRES REVISITING MISCLASSIFIED EXAMPLES

Wang Y, Zou D, Yi J, et al. Improving Adversarial Robustness Requires Revisiting Misclassified Examples[C]. international conference on learning representations, 2020.

@article{wang2020improving,
title={Improving Adversarial Robustness Requires Revisiting Misclassified Examples},
author={Wang, Yisen and Zou, Difan and Yi, Jinfeng and Bailey, James and Ma, Xingjun and Gu, Quanquan},
year={2020}}

作者认为, 错分样本对于提高网络的鲁棒性是很重要的, 为此提出了一个启发于此的新的损失函数.

主要内容

符号

h θ h_{\theta} hθ: 参数为 θ \theta θ的神经网络;
( x , y ) ∈ R d × { 1 , … , K } (x,y) \in \mathbb{R}^d \times \{1,\ldots, K\} (x,y)Rd×{1,,K}: 类别及其标签;

h θ ( x i ) = arg ⁡ max ⁡ k = 1 , … , K p k ( x i , θ ) , p k ( x i , θ ) = exp ⁡ ( z k ( x i , θ ) ) / ∑ k ′ = 1 K exp ⁡ ( z k ′ ( x i , θ ) ) (2) \tag{2} h_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right)=\underset{k=1, \ldots, K}{\arg \max } \mathbf{p}_{k}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right), \quad \mathbf{p}_{k}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right)=\exp \left(\mathbf{z}_{k}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right)\right) / \sum_{k^{\prime}=1}^{K} \exp \left(\mathbf{z}_{k^{\prime}}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right)\right) hθ(xi)=k=1,,Kargmaxpk(xi,θ),pk(xi,θ)=exp(zk(xi,θ))/k=1Kexp(zk(xi,θ))(2)

定义正分类样本和误分类样本
S h θ + = { i : i ∈ [ n ] , h θ ( x i ) = y i } a n d S h θ − = { i : i ∈ [ n ] , h θ ( x i ) ≠ y i } . \mathcal{S}_{h_{\theta}}^+ = \{i : i \in [n], h_{\theta} (x_i)=y_i \} \quad \mathrm{and} \quad \mathcal{S}_{h_{\theta}}^- = \{i : i \in [n], h_{\theta} (x_i) \not =y_i \}. Shθ+={i:i[n],hθ(xi)=yi}andShθ={i:i[n],hθ(xi)=yi}.

MART

在所有样本上的鲁棒分类误差:
R ( h θ ) = 1 n ∑ i = 1 n max ⁡ x i ′ ∈ B ϵ ( x i ) 1 ( h θ ( x i ′ ) ≠ y i ) , (3) \tag{3} \mathcal{R}(h_{\theta}) = \frac{1}{n} \sum_{i=1}^n \max_{x_i' \in \mathcal{B}_{\epsilon}(x_i)} \mathbb{1}(h_{\theta}(x_i') \not= y_i), R(hθ)=n1i=1nxiBϵ(xi)max1(hθ(xi)=yi),(3)
并定义在错分样本上的鲁棒分类误差
R − ( h θ , x i ) : = 1 ( h θ ( x ^ i ′ ) ≠ y i ) + 1 ( h θ ( x i ) ≠ h θ ( x ^ i ′ ) ) (4) \tag{4} \mathcal{R}^- (h_{\theta}, x_i):= \mathbb{1} (h_{\theta}(\hat{x}_i') \not=y_i) + \mathbb{1}(h_{\theta}(x_i) \not= h_{\theta} (\hat{x}_i')) R(hθ,xi):=1(hθ(x^i)=yi)+1(hθ(xi)=hθ(x^i))(4)

其中
x ^ i ′ = arg ⁡ max ⁡ x i ′ ∈ B ϵ ( x i ) 1 ( h θ ( x i ′ ) ≠ y i ) . (5) \tag{5} \hat{x}_i'=\arg \max_{x_i' \in \mathcal{B}_{\epsilon} (x_i)} \mathbb{1} (h_{\theta} (x_i') \not = y_i). x^i=argxiBϵ(xi)max1(hθ(xi)=yi).(5)

以及正分样本上的鲁棒分类误差:
R + ( h θ , x i ) : = 1 ( h θ ( x ^ i ′ ) ≠ y i ) . (6) \tag{6} \mathcal{R}^+(h_{\theta}, x_i):=\mathbb{1}(h_{\theta}(\hat{x}_i') \not = y_i). R+(hθ,xi):=1(hθ(x^i)=yi).(6)

最后, 我们要最小化的是二者的混合误差:
min ⁡ θ R misc  ( h θ ) : = 1 n ( ∑ i ∈ S h + R + ( h θ , x i ) + ∑ i ∈ S h θ − R − ( h θ , x i ) ) = 1 n ∑ i = 1 n { 1 ( h θ ( x ^ i ′ ) ≠ y i ) + 1 ( h θ ( x i ) ≠ h θ ( x ^ i ′ ) ) ⋅ 1 ( h θ ( x i ) ≠ y i ) } . (7) \tag{7} \begin{aligned} \min _{\boldsymbol{\theta}} \mathcal{R}_{\text {misc }}\left(h_{\boldsymbol{\theta}}\right): &=\frac{1}{n}\left(\sum_{i \in \mathcal{S}_{h}^{+}} \mathcal{R}^{+}\left(h_{\boldsymbol{\theta}}, \mathbf{x}_{i}\right)+\sum_{i \in \mathcal{S}_{\boldsymbol{h}_{\boldsymbol{\theta}}}^{-}} \mathcal{R}^{-}\left(h_{\boldsymbol{\theta}}, \mathbf{x}_{i}\right)\right) \\ &=\frac{1}{n} \sum_{i=1}^{n}\left\{\mathbb{1}\left(h_{\boldsymbol{\theta}}\left(\hat{\mathbf{x}}_{i}^{\prime}\right) \neq y_{i}\right)+\mathbb{1}\left(h_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \neq h_{\boldsymbol{\theta}}\left(\hat{\mathbf{x}}_{i}^{\prime}\right)\right) \cdot \mathbb{1}\left(h_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \neq y_{i}\right)\right\} \end{aligned}. θminRmisc (hθ):=n1iSh+R+(hθ,xi)+iShθR(hθ,xi)=n1i=1n{1(hθ(x^i)=yi)+1(hθ(xi)=hθ(x^i))1(hθ(xi)=yi)}.(7)

为了能够传递梯度, 需要利用一些替代函数"软化"上面的损失函数, 对于 1 ( h θ ( x ^ i ′ ) ≠ y i ) \mathbb{1}(h_{\theta}(\hat{x}_i')\not = y_i) 1(hθ(x^i)=yi)利用BCE损失函数替代
B C E ( p ( x ^ i , θ ) , y i ) = − log ⁡ ( p y i ( x ^ i ′ , θ ) ) − log ⁡ ( 1 − max ⁡ k ≠ y i p k ( x ^ i ′ , θ ) ) , (8) \tag{8} \mathrm{BCE} (p(\hat{x}_i, \theta),y_i)= -\log (p_{y_i} (\hat{x}_i',\theta))- \log (1-\max_{k\not=y_i} p_k(\hat{x}_i',\theta)), BCE(p(x^i,θ),yi)=log(pyi(x^i,θ))log(1k=yimaxpk(x^i,θ)),(8)
第一项为普通的交叉熵损失, 第二项用于提高分类边界.

对于第二项 1 ( h θ ( x i ) ≠ h θ ( x ^ i ′ ) ) \mathbb{1}(h_{\theta}(x_i)\not=h_{\theta}(\hat{x}_i')) 1(hθ(xi)=hθ(x^i)), 用KL散度作为替代
K L ( p ( x i , θ ) ∥ p ( x ^ i ′ , θ ) ) = ∑ k = 1 K p k ( x i , θ ) log ⁡ p k ( x i , θ ) p k ( x ^ i ′ , θ ) . (9) \tag{9} \mathrm{KL} (p(x_i, \theta)\| p(\hat{x}_i', \theta))=\sum_{k=1}^K p_k(x_i, \theta)\log \frac{p_k(x_i,\theta)}{p_k(\hat{x}_i',\theta)}. KL(p(xi,θ)p(x^i,θ))=k=1Kpk(xi,θ)logpk(x^i,θ)pk(xi,θ).(9)

最后一项 1 ( h θ ( x i ) ≠ y i ) \mathbb{1}(h_{\theta}(x_i) \not =y_i) 1(hθ(xi)=yi)则可用 1 − p y i ( x i , θ ) 1-p_{y_i}(x_i,\theta) 1pyi(xi,θ)来代替.

于是最后的损失函数便是
L M A R T ( θ ) = 1 n ∑ i = 1 n ℓ ( x i , y i , θ ) , (11) \tag{11} \mathcal{L}^{\mathrm{MART}}(\theta)= \frac{1}{n} \sum_{i=1}^n \ell(x_i, y_i, \theta), LMART(θ)=n1i=1n(xi,yi,θ),(11)
其中
ℓ ( x i , y i , θ ) : = B C E ( p ( x ^ i ′ , θ ) , y i ) + λ ⋅ K L ( p ( x i , θ ) ∥ p ( x ^ i , θ ) ) ⋅ ( 1 − p y i ( x i , θ ) ) . \ell (x_i,y_i,\theta):=\mathrm{BCE}(p(\hat{x}_i', \theta),y_i)+\lambda \cdot \mathrm{KL} (p(x_i,\theta) \|p(\hat{x}_i,\theta)) \cdot (1-p_{y_i}(x_i, \theta)). (xi,yi,θ):=BCE(p(x^i,θ),yi)+λKL(p(xi,θ)p(x^i,θ))(1pyi(xi,θ)).

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值